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ABSTRACT 

The  document  presents  findings  from  a  comprehensive 
review  of  the  literature  on  the  topic  of  nonbiased  assessment.  An 
introductory  chapter  describes  the  review's  conceptual  friimework. 
Chapters  2  through  9  present  analyses  of  the  following  major  aspects 
of  the  topic  (sample  subtopics  in  parentheses):  historical 
perspectives  (ancient  influences,  nineteenth  century  developments, 
the  emergence  of  differential  psychology);  conceptual  models  of  hiiman 
functioning  (seven  models  of  human  behavior  that  influence 
contemporary  assessment  practices);  technical  test  bias  (implications 
of  validation  theory,,  external  and  internal  construct  bias); 
situational  bias  in  psychological  assessment  (test-wiseness,  examiner 
sex  and  race,  motivational  factors);  outcome  bias  (prediction  of 
specific  outcomes,  selection  versus  intervention,  a  variety  of 
selection  models);  proposed  alternatives  to  traditional  assessment 
(culture-reduced  testing,  renorming,  adaptive  behavior  assessment, 
Piagetian  assessment  procedures,  learning  potential  assessment, 
clinical  neuropsychological  assessment,  behavioral  assessment 
strategies);  ethical  and  legal  considerations  related  to  nonbiased 
assessment  of  children  with  learning  and  behavior  problems  (moral 
principles,   invasion  of  privacy,  consent,  access  to  records,  issues 
in  intervention);  and  the  influence  of  professional  organizations  on 
assessment  bias  (positions  of  various  professional  groups  regarding 
testing/'aisessment  practices).  A  final  chapter  summarizes  the  state 
of  the  art;  considers  implications  for  deci sonmaking  in  special 
education,  and  offers  guidelines  for  practice.  (CL) 
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Concept  Ur.i  1   I-'imimc  vur  H 
'Hir  use  ut    |)sy  rl.o  log  m;  a  1    arul  (HiviC  a  L  i  ona  I    ta\st.5;   lUis    LncrtMSod   raj)  idly 
<>v«T    I  lu'   j>iist    t  «'v  rii'cadrs.     Tliis  ^>ii)  I  i  1  ex  at  loii   is    in  part    cl»,  jiiincnfaMJ  hy 
the  growth  of   tlie  Buros*  htMital  MeasurgnKMit  Yearbook  from  400  paj^es  in 
1938  to  over  2,0^00  pages   in  1978  (Haney,    1981).     Many  reasons  can  and  have 
been  offi^r'nl   {'or   thi. s  ^^rowtK.-     It  has   been  .suggested   that   the  growth  of 
psychological   testing  may  reflect  an  aspect  of  our  mass  society,  specifically, 
a  need  for  our   institutions  to  deal  with   large  numbers  of   individuals.  With 
regard  to  the  use  of  tests  in  education,  Garcia  (1981)  concludes  that  pro-- 
ponentsS  of  mental  measurtment  believe  that  '|^m jeasur ing  human  abilities 
by  s tanda rd  L /.e()   tests  would  presumably  increase  educational  productivity 

and  sort  tt\e  various  grades  of  humans  for  their  roles  in  the  industrial 

■  ) 

society*'  (p.   1172).     It  has  also  been  suggested  that  the  reason  may,  in, 
part,   be  explained  by  the  love  affair   that  America  has  always  had  with 
technology  and  psychological   testing  is  but  one  expression  of  that  special 
devotion      (Boorsteiii,    1974,  cited  in  Haney,   1981).     As  Haney  (1981)  con- 
dudes,  however,  all   the  reasons  offered  foi  why  psychological  and  educa- 
tional   testing  has  grown  so  rapidly,  none  attribute  it  to   the  increased 
ability  of  psychological  science  to  better    .easure  mental  processes.  Given 
the  i.mnortance  of  social  utility  as  an  explanation  for   the  growth  of  psycho- 
lo.v.  leal   testing  rather  than  increased  quality  of  the  devices  themselves, 
it  is  no  wonder  why  psychology  often  finds   itself  defensively  engaged  in 
research,  after  the  fact,   to  demonstrate  the  utility  of  its  tests  in  the 
face  of  public  concern.     One  such  concern  that  has  been  heard  most  vocif^ 
'i'TOusly  in  recent  years  is  the  perceived  biased  nature  of  these  testa  when 
used  in  making  decisions  about  individuals  whose  backgrounds  are  different 


Ah  s<*M  sme'.iK  hi. -is 
? 

from  thosr  r.ii'ir!    i  i       i  n  ;  1 1  tsiin  An^;  1  u  ov  dot '  1     iit  ciilLut^'. 

In  ^.(>rtlnJ.;  ;  In  cmjv  volMuiiiious    I  i.  tor  a  Lur to   idiMKLty   (iw  reason*; 

for   thir»   roruM^rn,    Lv/o   Lsaifs  st^md  out    most   n-om  Lncn  t  I  y  .     Fir-        it  is 
>iu;<^^';(' ;>  It'd   iS^iL  our  s  u  i    t  y   is  iiL^;h]y  sen.'i  i  t  i  zed   to  any  institutionalized 
practices   that  result   in  a  reduction  of   freedom  of  clioico   (!(a'    y,  1981), 
When  tests  are  employed  to  make  decisions   that  nay  hold  back  individuals 

)»ro'tp     mI     1  uH  V  i  d' 1  ?  !      t  rnjt   'Umrin;;;    i;,   wfint    i «;  offm   ref'»rr<nl    ro  ;j "the 
America:    dream",   potentL.il   biases   in   that  process   that  may  resu'. t   in  an  • 
unwarranted  denial  of   freedom  are  closely  scrutinized.     Cole   (1981)  j^oints 
out   that'  the  concerns  the  public  have  towards  psychological   tegting  ultimately 
fo^is  on  the  social  policy  decisions  t:^lat  are  made  with   the  aid  of  these  tests. 
Tht^  use  of  psycho  I  'i^.  1  ca  1    testing:    in   The?  schools  for   Che  classification  and 
placement  of  clii  1  Jren  in  classes   foi    tlie  mentally  handicapped  serves  as  a 
case   in  point.     Classes  for   the  mentally  handicapped  have  coma ' to  be  known  as 
classes  having  little  academic  emphasis,  poor  fac  ili  tires  ,  and  inadequatel> 
trained   teachers  CMcMillan,   1977).     Given  such  a  perspective,    the  use  of 
psychological   testing  for  placement   in   t]iese  often  called  **dead-end*'  educa- 
tional  tracks  has  c  o  c  e  ived  muc  h  scrutiny,        Th  is  sens  x  t iv  i  ty  comb  ined  with 
the  Riverside  epidemiological  studies  in  the  early  seventies   (Mercer,  1970, 
4973)  highlighting  the  d ispr opor tional  representation  of   culturally  different 
children  in  these  classes  resulted  in  demonstrative  outcries  with  the  debate 
carrying-  over  into  the  courts. 

A  second  reason  for  the  public's  concern  for  bias   in  testing  lies  in 
the   implications  that  are   inherent  in  the  measured  d if f erences -  be tween 
culturally  different  children  and   those  Zrom  the  dominant  Anglo  culture 
(Reschly,   1981),     Psychological   tests  are  primarily  designed  to  measure 


3 

cons  t:  nit' t  .'t  .      i:on.s<'<j'.^«' n  t  1  y  ,  itnp  1  i  e'  it  i  nn    Milicimt    in   tlw  ilrs  i^mi  n\ 

purportrii   to   b<'  v,ilf5<l    i tluit    v.ni.ition    in   pr  r  t  ()rni.in(M'   connotes  J  i  t  1  r  i  tiict'.s 
in  thr  rnoasiirt'vl  construct.      ']\ic\\  an   i  mp  I  i  f  a  t  i  on  would    I  (M(1  on*'   t)   the  in 
rvitahlp  coiu'lusilon   tluU    ciil  tin  ally  cil^tVront    children,    as  a  ar»  Ic.s 

capable*  thai:  An}.;lo  childr{*n.     Wlun  diMling  witli  a  constr-tt    such  as  intelli-- 
gence  that  has   long  been  viewed  as  a  charact^^i  .(        heavily   intluenced  by 
j'.oneric  (Midownienf  ,   on**  ran  appreciafe   the   re  i  ^ou   for   flie  coTV  .M'ns  of  the 
public   re^'ardin^;   t  lu^  potential   hi/is    In   tc^tin^,..     The   .sti^',nia  att  .ched   to  ^ 
lab(?ling  a  dispcoporti    late  number  of  culturally  different  ind  i     ^  -lals  retarded 
and   the  perceived  insulting  nature  of  a  premature  conclusion  that  c*:»     .  otip  of 
people   IS   less   intelligent   than  another   is  not  only  cause  for  conc(-rn  but  to 
some,    reprcflens  i.b  le  ,  ^ 

i  . 

In  ri^sponse   to  concerns  tor  potential   bias   in  testing,    then     ire  those 
who  conclude  that  mean  differences  across  groups   is  enough  to  sub  ■ .  n  t  ia  te 
charges  of  bias  (Alley  &  Foster,   1978;  Chinn,    1979     .illard,    1979;  Jackson, 
1975;  Mercer,   1976,  Williams,   1974),     For  example,  Alley  and  Foster  (1978) 
conclud:   that  for  test.s  to  be  nonbiased,    they  need   to  yield  equivalent- 
distributions  of  scores  across  groups-     Others  have  studied   the  questi  a? 
of  bias  by  examining  both   tescs  and  the  assessment  process  in  an  effort  to 
determine   if  measured  group  differences  are  *'reaL''  differences.     Some  have 
focused  on  bias   in  the   technical  va 1 idi ty ^sense ,   some  have  looked  at  bias 
as  a  function  of  situational  f  ac  tor  i- .inher  ent  in  testing  settings,  whi^e 
others  have  focused  on  potential  b-as  in  the  assessment  process  within^ 
which  testing  is  bften  an  integral  part. 

Still  others  hc've  addressed  concerns  for  bias  by  devoting  their  effortb 
r-fwroposing  and  examining  alcernative  met^-^ds  to  traditional  testing 


Ah  HI'S  ^  Mw  n  f    ti  i  a  s 
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pl.irficr-;    tint    pur  pot  I     to    fx'    ♦*ilfnvr    MilHMt'Utly    lU  n  f  f .  i  a  f-ril  C  i  i  t  i*  i  i  ( 

i**l*'mi(MMl    trstiii>'.)   or    i  lapi  ov  JMiini  r  s   to  [iirst'iit    [)i  m  ti('r   ((^.f,.,    MMior:u  i  n^- V 
In  aiiil  i  I  ion   ti)   r<".«Mi(h   ♦  Mort?.  ,   .1   oy  -  prtxluc  t    ol    f  lir  |)ubli('',s  conct^i"  has 
txMMi  rvolutic'i)  o!    a    iil^MMluir   rrl.tt<Ml    lo   t  li<'    jiidicLal     tiul    1      i  s  1  <i  t.  i  v<* 

impact  ot   [>ossiblr  b  i  a  .s   in  psychological   a  s  .S(»  s  iiiiu»ii  t .     Olficial  positions 
havo  also  bvcn  adopted  by  .S(»veral  or^;aniza  tions  whos(?  members  are  involved 

in    [)••.  vc  h« )  I  m;'  i  ( ■  1  !     \^^,\    cdtn  ,M  ii>  ,  il    ,  1      <  v- . '^nn' n  t  , 

As  describtvi  above,    th(?  response         charges  o[  biased  assessment  has 
been  a  Iren/^y  of  study  ot    the  issues   in  a  variety  of  disparate  areas.  Given 
the   inherent  cnwieldy  nacure  of   the  literatui^,   periodic  reviews  primarily 
designorl   to  .1 1  Low  [o^  \-eflev  lion  and  planning   for   the  future  becoir.i  increas- 
ingly  inipor  t<:?r.  t  .      It    i  major   purpose  of   th.e  repoit   to  do  just  that. 
Specifically,    Lhe  pM:-,joses  0/   the  present  review  are   fourfold.     First,  it 
is  the  purpose  of  this. review  to  be  comprehensive   in  scope.     To  that  end, 
all   the  various  and  disparate  ways   that  the  issue  of  nonbiased  assesaprent 
has  been  address^»d  are  included.     Second,   an  attempt  is  made  to  ptovid^  a 
conceptual   framework  for  organizing  the  mass  of  information  presertly  avail*" 
able  djki  the  topic.     Thirfl,   a  critique  is  offered  of   the  writings  and  research 
in  each  of   the  areas  wichin  the  framework  presented.     Finally,   an  evaluation 
of  each  of  the  areas  within  the  framework  will  be  offered  to  provide  an 
opinion  as  to  the  future  contribution  that  each  has  yet   to  make  when  examined 
against  the  evolving   trends  in  the  overall  area  of  nonbiased  assessment. 

Conceptual  Framework 
When  conducting  a  review  of  any  body  of  literature,   a  primaiy  goal  is 
to  develop  a  framework  for  conceptually  organizing  the  mass  of  information 
potentially  available  for  inclusion.     It  was  an  assumption  of  the  present 


s 

>  1  t  <  >  I  r     t  ti.i  I     t  li  >  <    )  1  I  >  r  .  I  r  <  I  >  '       lit  >i  1  1  (I  ,    .1  '    miH  I)    A'f    |ii )  <> I  I)  I  <  ,    .1  I  (  t  .1  t  I'    t  li< '    ',(  n|)4' 

)  t    t         I  t  .iiiM'woi  f    I  <i  (  hi'i     t  Inn    Mir    pi  » '<  I ) «n  I '  I  v  cil    f)  i  .iMr.',    ol     t  hr    .iu  I  IidI  s  . 

]()ii5;.'(|urn  r  I  y  ,   a   triUalivr  (onr  <•  j)t  na  1    I  ranu'wor  k  was   jn  »■»  t  u  1  .i  t  tnl  ,ir    t  oiMsrt 

)l    tin*    rrvu'w   that     'a 'i ,    ^'V  tlrsi^ai^   »  on  I  nui.illy   r  rv  i  .s-      ilni  aw  «'Haminal  mn 

)  t    t  lu'    I  i  t  r ra  t  tir  r  . 

In  an  «»f(i)rt:  to  br  ti.s  comprohf^ns  i  vo  as  possible!,    th('  authors  nllowpd 

Iw  i  1     initial     .  •  -  a  I  <  1 1    t  i  >    r  .  i  n    t  •    f  i>   any    1  i  t  ♦»  r  a  f  n  r  • «    that    ; » 1 1  .  [)« >  r  t  ♦><  1    {  n  r     1  a  t  « 'f  1 

I 

:o  the   topic   1)1    nt  aib  i  as       a  ?;  j;*' ssiikmi  t  .      Scartln^s  Wi'tr  ('Onduc  ted    in  various 
liscipliin^s   inc  lud  ing  ecliica  t  ion,  law,   sociology,   psychology,   and  mf^dicine  with 
:he  only  re^striction  being   that   the  focus  of   the  literature  had   to  be  on 
leasures  of  nu^ntal  and/or  psychological   processes  and/or  related  behavior 
IS   they  apply   to  dc^ri.sions   of    se](M:tion  and /or   intervimtion , 

The  product  of   this  (effort   is  a  conceptual   framework   that  includes 
light  major  areas,   each  reviewed   in  the  remaining  chapters  of   this  report, 
rhese  major  areas  include:     (1)  historical  perspectives,   (2)  conceptual 
lodels,   (3)   technical   test  bias,    (4)   situational  bias,   (5)   outcome  bias, 
|6)  proposed  alternatives  to  traditional  practice,   t7)  judicial  and  legis- 

ative  influences,   and  (8)  professional  association  influences.  Discussed 
>elow  is  each-  of   the  eight  major  areas   including  a  brief  discription  of  the 

iterature  covered  «ind  the  rationale  for  its  inclusion. 
iistor ical  Perspectives 

The  first  of  these  major  areas,   historical  perspect  ^'es,  reviews  the 
ivolution  of  psychological  and  educational  assessment  and  reports  on  histor- 

cal  references   to  biases  throughout  its  developing  history.     In  order  to 
iain  a  full  appreciation  of  the  issues  involved   in  present  day  concerns,  it 

s  necessary  to  acquire  an  understanding  of  the  developmeot  of  psychological 


Am      m  «iim' n  t    \\  i  .i  m 

1  -  rM*  '1  Mill -lit      .Ml' I      '  1  -i  I  .  I  f   I  <  til ''i'  '  I  ■      '  <  '     I  In-      1      MM'   i      t  M  "HI     wll  1  '    ll      If      W.  IM      * .  | » . »  WT  M  "  <  I  . 

Su(  ii  .?  i  I ''-^   inti)  .j.ifiq"'^^'     j  >'ji  1 1  >:li(  M  till-  1  .M  t   tii.»(  iii.iiiy  (){  iMji  |)irs»'nr 

<l,iy  t  oiu*' f  11 wiM\  My.ird    t<>  ^'i'**   h.iv»'    llirir    ori^Mii    in  |>.j'it    .j  ^     s 'iiiu  •  n  t  pi.iitic 

In    J  ( 1  <  I  i  t  I  I  M  I  ,     t  1 1  '  • .    I '  \  ,  ,  .  ,  . ,  I  (  )  1 1    I  "  *  »  ^'  I  ( I  ♦ ' : ;  W  I  I  h     I     t  U  (1  <  •  I     . .  I )  j  M  <  •  V   I  .  1  t  I  n  1 1    »  )  ♦     f  1 1  ♦  • 

various  w.iys  (uitur».«i  h,iv»'  c  mim  ♦ '  p  t  u.i  1  i Imin.m  fx'li.iv  ior  ,ni<i  how  con  t  rmpoi  a  r  y 
a S5M» ssnirnt    practicM'  brrii   i^iMurncrJ   by  such  c one <»pt iia  I  i /.a t  ions  . 

( \ ) lu' « •  p  I  u  .  I  1    Mi )( 1 1 '  '  \ 

ITio   ^;('^'o^(I  ni.i  joj-   .ncM  m«'l'  *        h[>   tlu-   1  ramrwork    lor    thr   prt'S^'iil    i  <'virw 
involves  a  cli.scnssion  ot    fbc  ^'^^nc<'ptua  I  i[U)d<»ls   that  aro  presently  proposed 
for  understand         human  functioning;.     The  medical,    intrapsychic  disease, 

V 

psychoeducat:  Lon'i  I   proces"^,  behavioral,    soc  iolo^'.ical  deviance,  ecological  and 
p  I  ura  I  i    t  i     mocl^' I  s  :ir('    f-t'viewrd.      I)  i  f  1  ,    rnt  Model:;    for  conceptualizing  human 
f  unc  t  ion  in)',  iMch   have    th<  ir  i>wn  assumptions  regarding  the  "why*'  of  behavior. 
Each  inodel  dict-'^tiec  diff^J^Gnt  assessment  approaches  ,^  each  with  different 
implications  for  bias.     Consequently  each  of  these  models  are  described  and 
their  implications  for   bias  i"  their  respective  assessment  practices  are 
d iscussed  . 

Empirical  S  tud  i^g^  inBiag 

"One  of  the  mo^'e  difficult  aspects  of  our  task  was   to  ccme  to  grips  with 
the  various'^ways   in  whicb  bias  has  been  defined  and  consequently  studied  in 
the  empirical  literature.     Most   issues  dealing  with  nonbiased  assessment  are 
emotionally  changed  and  full  objectivity  is  like   the  proverbial  end  of  the 
rainbow  -  never  Reached.     Yet,    the  application  of  scientific  method   to  the 

r 

Study  of  bias  in  assessment  has  been  both  plentiful  and  fruitful.  This 
gives  testimony  to  the   power  of   science  in  mediating  disputes  and  sorting 
through  biases  e^'en  when  tho^^^  biases    ^re  held  by  those  charged  with 


.1  |l  j  •  I    V   i   (  I)-         I    f      .       )M.     I    1  l>  "i  ;\  (     r  .   I  .  '        i     \    'r,'\  \     }        |M  >  Ml  i   t     ,         )    .       .>       M    <     !)•     t      .         Ill        I    it    I      <  i  >    <  . 

u  Mn  M  V    111!     .  f  1  f  "   -MM     . .  1     t  w> .   t   iMp-.  .       J  f  !i    I     t  i-.'-     I  <  •  1  .  n>  I    1   ,   . .  f     f     t  .  •  r  i     •   ,  .1 
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i /■»  t  i  <  >ii  <)l    «'jii[Mri(.M    1  i  t  <  ■  i  .1 1  u  r  r   lli.it  .i  t  t  fnip  I  »•  d   to   .iddri'sj.    t  fir 

s  t  i  ( )i  I  t  (1  whrthn    i!  i  I  I  c  r       »' s  .iiiion^  y'.rniips    in   llirii    }mm  1  01  Jiwiiu  »'  on 

t  r  s  t       1  •;   .1    t  n  nr  I  i  DH   i » 1    I)  i .  j  s   01    ir  p  m  ".  »Mi  I    "  r  .  \  j  1  *'  <  I  i  M  r  i  nu  r    .      I  in  p  1  ( >  y  i  n>'. 
I  I  .111  I  f  1  .MM  1    V  .1  I  i  (i.i  t  i  (Ml    t  h»'t)  r  y    .is   .1    l>,i  si  s    t  ( >i    d    l  »•  i  in  i  iii  n^,    t»  i    s  ,    I  lit*  s«*    \  t  nd  i  r  s 
att*Mn[>t    to  .itldrrss   tin-    issnr  ol   whrtln'r    or   not   tt'sts  art^  measuring,   t\\v  sam<» 
construct   acro.'is  groups. 

Tlic    t  1  f  t  li  a  r   .1   also   r  o po r  f «.     n  .in  tMnp  i  i  i  t  ,i  1  1  y  -  h.i  •.r<i    1  1  t  c  i  a  t  u i  <•    t  Im  t 
I  ot  n  SOS   on   [)o  t  cn  t  i  a  1    bias    in    tin'  on  t  r  one  ■;    1 » t    thr   on  t  11  c    1  %  s  o  s  -,nr  n  I    pi  yn  (■•..•. 
that  oitlior  may  or  may  not    incluii«*    iUv  use  ai    ti'sts.      Tht:s*    studiiv*;  (  in  hi 
viewed  .1;  employing  an  expand(»d  v  rsion  of  validation   theory  that  includt^s 
the   study  of  whether  or  not   terts  art*   equally  valid  across  groups  wh(»n  nsvd 
to  predict  desired  outcomes.     Tlu'St*   three  areas  of   the   review  are  identifiivi 
as   technical   test  bias,    situational  bias,   and  t)utcome  bias.     A  brief 
description  of  each  follows. 

Technical  Test  Bias.     By  far  the  most  organized  search  for  bias   in  assessment 
has  come  out  of   the   Iil?^rature  on  technical   test  bias  and  this  makes  up  one 
of  the  major  areas  of   this  review.     Technical   test  bias  is  defined  as  bias 
in  a  purely  statistical   sense.     Wlien  speaking  of  testing,  bias  refers  to 
"systematic  errors   in  the  predictive  validity  and  construct  validity  cf  test 
scores  of  individuals   that  are  as soc ia ted ^wi th   the   individual's  group  member- 
ship'* (Jensen,    1980,   p.   375).     Thus,   tHose  who  choose  to  examine  bias  from 
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! .  I  .1 !  I  I  .  I  n  I  II.  ■  '    f  •  V     I    -  I  1  t  I  ■  I  I  <  'M   • '  \  f  « ■  I  n.i  I     (  <  ^    t  h      t  ♦  •    r  .       It    •  '"i  }>  1  oy      ^  r  .  t  <m  i  <i 
f  .)    r  lit)  <l    in   |>  r  « I ,  t  I  V .  •   v  .1  1  I  *1    '  v      t  uti  j  r      .nid   .i  *.  P  s    t  hr   ([u«' s  t  i  on  ,    "  Dor  s 

Mm'    I  I".  I    I     :  J  f  r    r  >     'X  t  rMi.j  I    (  t  1  f  '  i  I  -  J    r  f  j  M .  I  1  1  y   w<  •  1  1    . » '  r  o " .  s      r  o  u  [  >  .    .i*.    woii  1  '1 
^  .     1  >  1  •    !  I     *  •    '     .   !     '  '  I    '  '  1  ■      V  .  >  n  .  '  I  !  M  ^      if     I '  1  I  ; ' .  '  1  r  ■ ,     r  .  >    I  ; » ■ .  I  .  M  I  ' ■  .        I  (     1  • .    a  I  »;  i '  • » 1 
I  ^  I  f     I  fi.-    !       •     f      i  f    :     '  I  I     i  t    «l  J  t  1     I  en  t  M}!  .    }.«•  I  I  .  >T  n     1  1  t  t       I'lit  I  V 

,1    t*'\l  .   not     inln.it*'    [)i.is.      M    nrir   susprrf'.    h'.i*'.   .is  ,'i   con  srfj(irn(. of 

(1  i  t  t  r r      I  i  a  1    [)•■  r  1  om.inr     .iniop;;  >'.r{>ups  ,    lhf*n  onr  c;in   t<»st    th**  hyp()th»*si9 
to   srr    it    t        tt'st    pirdirts   som«*  c  I  i  t      i  on  d  i  t  f   r  o  n  t  I  y   tor  (iilltTrnt 
,',roii[>'.  .      n    thr    It'sts   [)r  rd  i  f  t  «\s  <l  i  t  t  o  r  i*  ri  t  1  y  ,    tin-    trst    lan   b»»  consititTod 
ox  t  r r  n.i  1  1  y  h  i  .j simI  . 

Internal   Const  iiu  L    h  i  .i  s .     UTiile  oxternal   construct  bias  focuses 
C*xclusively  on  an  rxternal   criteria  to  determine  bias,    internal  construct 
bias   locus  OS  on   the    internal   structure  of    the   test    to  determine   if  the 
tost    is  measuring;   the  sane    thing   for   all,    regardless  of  group  tnean 
differ  one  OS.      Ihe   .act   that  a   test   predicts  equally  well   for  all  provides 
only  partial   verification   that   the  cons  true t  .the   test  purports   to  measure 
is  doing   so   in  an  unbiased  manner.      In   internal  construct   bias,  methtnls 
iic;ually  used   to, support   the  construct  and  content  validity  of   tests  are 
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employed  to  determine  if  such  evidence  is  different  for  different  groups. 

V  -J c  V or   s  t  r  I  ■  ( ■  t" u  r  ( '  bin  5;  ,    <\\  s  ?  r  .;i r :  t o r  h  i  a  s  a n d    1 1  em  b  i  n  s  have   1 11  b e s  t u d  i  e c- 

tt_;-;    ,an  be  said   to  be-   internally  biased. 

.uatlor.al  Bias .     Another  area  that  has  received  a  substantial  amount  of 
attention  over  the  years  is  often  referred  to  as  situational  bias. 

ituational  bias,   also  referred  to  as  atmosphere  bias,   involves  the  study 
of  those  influences  in  the  test  situation  that  may  interact  with  group 
differences   to  produce  systematic  bias  in  performance  across  groups. 
Jensen  (1980)  identifies  six  sources  of  potential  situational  bias  that 
have  beer     ;  udied  and  are  r.ncluded  in  this  fourth  major  area  of  the 
review.     These  include  (1)   the  effects  of  prior  practice  or  coaching; 
(2)  interpersonal  factors  involving  the  attitude*,  expectancy  and  dialect 
of  the  examiner  and  the  manner  in  which  the  examinee  is  motivated  to 
perform;   (3)   individual  versus  group  administration  and  how  general 
classroom  morale  and  discipline  may  influence  performance;   (4)  timed 
versus  untimed  tests;   (5)   the  interactions  with  race  and  sex  of  examine^^ 
and  examinee;  and  (6)   the  potential  biasing  influence  of  the  halo  effect 
and  its  inflijence  on  scoring  test  performance. 

This  ar^a,   like  technical  test  bias,  can  be  viewed  as  a  study  of  the 
validity  of  tests  for  use  with  culturally  different  populations. 
Although  independent  of  the  test  itself,   situational  factors  have  the 
potential  of  impacting  on  test  scores  and  consequently  influencing  the 
validity  of  the  construct.     This  literature  makes  the  critical  distinction 
between  performance  and  capability  and  asks  the  question  whether  or  not 
the  performance  on  tests  of  individuals  from  differing  cultures  are 
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a<  .'urate  reflections  of  thei:   capabilities  (Henderson  &  Valencia,  in 
press).     From  a  social  learning  perspective,  only  when  motivational 

rond  i  t"  i  ons   n>---   (intiinnl    will    t'h;'        r  f  orvwrwr     r    p,-)  b 1  i  t"  v         ^'.^n    j.nd  i  \/ i  d  i. '  n  I 
iL:q:!.  ;    :\]:  i.r  .■c:Lii.ii    r:t,r,:  [  orvAi'ii'iC  ^- .  c[uei;  t;  I    .    i  i.   vai:   ^liOi::^    in  clu.^ 

performance  of   individuals   from  different  cultures  can  be  manipulated 
by  situational  factors  then  the  measure  may  be  said   to  be  biased  In 
essence,    this  is  a  test  of  the  construct  validity  of  the  test.     'i..at  is, 
is   the  test  measuring  the  same  construct  equally  well  for  all  individuals? 

Such  information  is  different  from  that  gained  from  examining  internal 
construct  bias  since  those  methods  can  only  provj.de  evidence  from  which 

one  can  infer   if  the  same  const;ruct  is  being  measured  ^  not  how  wjll  it 

'  i 

is  being  m.easured,     Eyidence  of  whether  or  not  one  group's  performance 

\ 

is  influenced  by  situational  variables  would,   likewise,  not  be  necessarily 
evidenced   in  an  examination  of  external  construct  bias.     If  a  situational 
factor  (e .g .3  achievement  motivation)   influences  a  criterion  measure  (e.g., 
academic  achievement)  to  the   same  degree  that  it  influences  a  construct 
measure  (e.g.,  intelligence),   and  if   the  situational  factor  differs  among 
groups,   then  one  would  expect  the  construct  measure  to  predict  the 
criterion  measure  equally  for  all  groups  even  though   the  construct  it 

k 

not  be  measured  equally  well   for  all  groups. 

Outcome  Bias.     As  implied  in  our  previous  discussion,  when  the  issue  of 

nonbiased  assessment  has  been  addressed  in  the  past,   attention'  has  usually 

turned  to  the  study  of  tests  and  their  validity  as  defined  within  the 

scope  of  content,   construct  and  criterion-related  validity.     Yet,  as 

defined,   a  technically  valid   test  provides  us  limited  information  on  its 

usefulness;     Indeed,  validity  in  a  traditional  sense  tells  us  only  how  well 

a  construct  is  being  measured,  not  how  useful   the  measure  is  in  making  decisions 
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To  this  point  both  Cronback  (1980)  and  Messick  (1975)  have  emphri- 
sized  that  the  different  types  of  validity  generally  offered  to  sub- 


variables   is   to  provide   further   information  regarding  v;hether  or  not  t'.e 
measure  is  "acting"  the  way  it  is  hypothesized.     It  is  not  intended  to 
provide  specific  information  regarding  the  value  of  its  use  for  making  a 
particular  type  of  decision.     This  point  is   exemplified  by  the  rather 
general  nature  of  the  criterion  measures  usually  chosen  to  establish  pre- 
dictive validity.'   For  exampl^,    in  providing  evidence  for  the  criterion- 
relatedness  of  a  measure  of  intelligence,  a  general  measure  of  academic 
achievement  is  often  used.     Such  a  measure  only  implies  the  construct 's 
usefulness  in  educational  decision  making.     Whether  or  not  the  construct 
measure  is  useful  in  any  one  particular  circumstance  has  to  be  determined 
by  the  success  of  the  outcome  predicted  by  its  use.     The  validity  regarding 
the  usefulness  of  a  test,   then,   involves  additional  information  than  that 
provided  in  predictive  validity  studies.     In  addition  to  the  traditional 
psychometric  properties  of  the  measure,  one  would  need  to  know  how  well 
the  measure  predicts  the  criterion  of  concern  in  the  setting  and.  for  the 
individual  for  whom  you  are  making  the  decision.     If  intervention  planning 
is  the  purpose  of  the  use  of  the  measure,   then  one  als?o  needs  to  know  the 

extent  to  which  one  can  predict,  success  of  an  in tervention  designed  from 

( 

the  use  of  the  test. 

In  discussing  this  point,  Cole  (1981)  suggests  that  the  inability  of 
technical  test  validity  to  provide  information  regarding  all  types  of 
interpretations  that  can  be  drawn  from  a  test  provides  evidence  of  the 
limitations  of  validation  theory.     Cronback  (1971)  writes: 


■■  - 1  ! 


The  use  o  i. 
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narrowly  considered,  validation  is  the  process  of 
examining  the  accuracy  of  a  specific  prediction  or 
'-ilr      i>(^.-   in/K*;         -mm   .?       'St    sctc  ■r^'-   broad  ! 

V  a  I  i d a  t:  i o n  < ' : ■  an;  i n c  s    t h c-   so u \\d a e  s  s   o  L   ail    t h ^:   i n  t e r 
pretations  of  a  test  (p.  443,   cited  in  Cole,  1981). 
When  we  broaden  our  focus  of  attention  in  a  search  for  bias  to  include 
the  study^of  outcojnes,   it  becomes  readily  apparent  that  the  search  needs  to 
encompass  more  than  the  study  of  test  bias  even  when  conceived  in  its 
broadest  sense.     The  information  provided  by  tests  is  only  one  §spect  bearing 
on  the  validity  of  decision-making  and  its  subsequent  outcomes.  Additional 
data  brought  to  bear  on  the  decision-making  pf^ocess  may  include  other  data 
on  psychological  functioning  that  has  no  established  reliability  and 
validity  (e.g.   subjective  judgments  of  a  teacher  regarding  the  intellect*  1 
functioning  of  the  child)  as  well  as  philosophic,  legal,   social.,  and  econoi  .  J 
factors.     ^11  have  their  impact  on  intended  outcomes,  and  all  have  the 
potential  of  being  biased. 

With  respect  to  the  latter,   there  are  those  who  point  out  (e.g.^   ^ 

Messick,   1975)   that  while  there  are  numerous  data  in  the  decision-making 
process  for  which   technical  validity  can  be  offered,   there  are  other  data 
related  to  the  values  of  the  decision  makers  and  those  responsible  for 
»     the  decisions  made  that  cannot  be  validated  in  a  psychometric  &ense .  The 
influence  brought  to  bear  on  the  decision  by  these  factors  can  only  be 
judged  by  the  potential  consequences  its  use  will  have  in  terms  of  social 
value.     Cole  (1981)  points  out  that  while  an  intelligence  test  may  be 
valid  for  helping  in  the  diagnosis  of  the  mentally  retarded,  "validity 
theory  does/not  say  whether  tUe  use  of  the  test,  or  the  whole  system  in 
which  the  test  use  is  embed/led,  will  produce  a  social  good  or  a  social  evil" 
(p.  1068). 
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To  further  illustrate  the  complex  of  factors  that  may  impact'  on  an 
intended  outcome,   let  us  examine  a  typical  process  used  to  decide  on  the 

generated.     Fonnal   test  data  will   typically   Include  information  on  the 
child's  learning  potential,  adaptive  behavior,   and  academic  functioning. 
Other  tes  t  data  regard  ing   the  child  '  s  perceptual  and /or  soc  ia 1-emot  ional 
functioning  may  also  be  collected.     Yet,  when  the  ultimate  decision  is 
made  regarding   the  classification  and  placement  of  the  child,    the  test 
data  becomes  only  one  source  of  data  used  to  make  decisions.  Nontest 
data  may  include  the  child's  history  of  school  performance,  attendance 
history,  attempts  to  remediate  the  problems  in  the  mainstream  class,  type 
and  quality  of  alternative  placements  available,   subjective  impressions 
of  the  team  members  regarding  the  child's  intellectual  functioning,  the 
parent's  support  for  diagnosis  and  placement,  whether  or  not  the 'placement 
involves  changing  schools,  available  transportation,  number  of  children 
previously  placed  in  such  a  class,  whether  or  not  such  a  diagnosis  and 
placement  will  disrupt  the  proportional  representation  of  minorities  in 
special  education,  among  others. 

In  addition  to  these  (and  many  more)  data  that  are  typically 
included  in  a  decision-m^aking  process  are  the  inferences  that  one  makes 
.abQjtfir''ni&Sda  ta  themselves.     Because  an  IQ  test  has  been  validated  by 
demonstrating  its  technical  validity  including  its  relationship  to 
academic  £ ch ievement ,  does  not  necessar4.1y  make  it  applicable  for  helping 


any  one  chriLd  in  any  one  setting..  In  this  respect,  validity  questions 
include  whetr>cx_^or  not  the  external  criteria  used  to  validate  the  test 
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really  ^peak  specifically  to  the  decision  made  about  that:  child  in  that 
situation.     And  what  about  the  outcomes  of  that  placement?    IDoes  the 

usp  of  r.hv  asst.vssmf^nf  dnt       boMi   topt  and   nonr.est>    h»ad   to  a  dec  i  ^  ■  on 

is   to   improvi'   the   learning  of    tliat  child  i  'len  one  can  effectively  argue 
that  the  utility  of  the  data   in  hringing  aDOut  this  outcome  should  be  an 
important  aspeclf"  of  the  validity  of   the  data^     As  a  consequence  of  the 
above  concerns,   there  are  those  who  advocate  that  our  definition  of 
validity  be  expanded  to  include   the  validity  in  predicting  desired 
outcomes       From  this  perspective  our  study  of  bias  needs   to encompass 
all  aspects  of  the  process  lending  information  for  making  the  decision. 

Those  who  advocate  such  a  position  have  expanded  our  arena  of 
empirical  efforts  in  two  ways.     First,   they  have  required  that  we  clearly 
distinguish  between  what  we  actually  know  about  the  measures  we're  using 
(i.e.,  psychometric  properties)  and  what  we  are  infe^^ring  in  any  given 
d3c is  ion-making  circumstance.     Second,   they  have  broadened  our  study  of 
varior.s  types  of  data,   including  nontest  data,   that  are  used  in  the 
decision-ntaking  process.     This  has  focused  our  attention  on  how  all 
sources  of  data,  and  the  interactions  among  the  data,   influence  decisions 
and  subsequent  outcomes.  ^ 

Those  who  share  this  perspective  usually  hold  to  a  more  decision- 
theoretic  model  of  assessment  as  opposed  to  a  classical  test-based  model 
(Cronback,   1971).     In  the  latter  approach  to  assessment,   emphasis  is  4 
placed  on  the  ac curacy  of  measurement.     It  endorses  the  use  of  the  best 
instrxnaent  available  for  measuring  a  construct  regardless  of  the  decision 
one  needs  to  make  with  the  data.     From  this  point  of  view,   if  one  were 
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interested  in  the  measurement  of  intelligence,  for  example^   it  wouldn't 
matter  the  purpose,    the  measurement  of  choice  would  be  that  instrument 
that  perforins  tho   task  most  reliably  and  validly;   that  is,   the  instrument 

education  class  proved  to  be  unsuccessful  for  a  child,  the  problem  would 
not  necessarily  focus  on  the  test  data.  A  'test  can  be  valid  regardless 
of  iaeflectual  outcomes.  Those  who  hold  this  position  usually  advocate 
resti feting  our  definition  of  bias  to  technical  test  bias.  The  effective- 
ness of  ^ti^intended  outcome  as  it  includes  philosophic,  legal  and  other 
such  considerations  or  the  inappropriate  use  of  test  data  or  other  data 
is  an  issue  of  "fairness"  and  "misuse",  respectivel^^  not  bias. 

As  stated  above,  N;liose  who  argue  for  a  more  encompassing  definition 
of  bias,   tend  to  hold  mo^e  of  deci sion- theoretic  model  of  assessment. 
From  this  perspective,   the  focus  of  any  assessment,  by  its  nature,   is  on 
the  outcomes  of  the  entire  process.     Information  derived  from  both  test 

./  . 

and  nontest  data,  as  well  as  social  value  considerations,  are  «ill  an 
integral  part  of  the  assessment  process  that  <^nnot  be  divorced  from  the 
utility  of  the  outcomes.     There  is  no  such  thing  as  the  perfect  test  for 
measuring  anything.     The  only  way  one  can  decide  on  the  appropria^^^ 
of  a  test  is  to  view  it  as  part  of  a  comprehensive  strategy  for  assessing 
individuals  for  making  specific  decisions.     The  validity  of  a  measure 
must,   therefore,-  be  judged  on  the  effectiveness  of  the  outcomes  of  any 
decision  that  employ  the  measure  in  this  process.     Consequently,  the 
same  test  or  other  pieces  of  data  may  be  valid  for  making  some  decisions 
while  invalid  for  making  others. 


16  . 

Two  different  types  of  outcomes  and  consequently  two  different  types 
of  validities  can  be  described.     The  first  related  to   the  selec  tion  of 
individuals,    the  second,   intrerven tion  with   individuals.     Likewise,  two 


Bias    m  Selection.     Tin    na jor  difference  between  bias   in  selection 
acd  bias  in  intervention  lies  in  che  purposes  of  the  assessments.  When 
using   tests  for  selection,    zhe  purpose  is  to  identify  a  test  or  tests 
that  will  allow  one  to  choose  among  those  who  take  the  test(s).     So,  for 
example,  when  using  a  Lesi  for  selection  in  hiring  or  admissions,  ones, 
purpose  is  to  choose  among  prospective  applicants  those  one  wants  to 
hire  or  admit  and  those  one  doesn't.     Since  the  purpo;  e  of  the  testing 
is  to  hire  or  admit  those  who  will  succeed  and  not  hire  or  deny  admittance 
to   those  who  will  not  surceed,    the  focvas  is  on  the  utility  of  the  test 
in  increasing  the  probability  of  i   iking  the  correct  choice.     Bias  in 
-selection,   then,   relates  to  whether  or  noc  the  decision-making  is 
biased  in  selec  ting  among  all  who  apply,  regardless  of  group  membership. 

Bias  in  Intervention >     Tests  used  fjr  decisions  involving  intervention, 
on'  the  other  hand,  have  an  entirely  different  purpose  from  those  used  in 
selection.     The  ulti^mate  purpose  of  this  type  assessment  is  to  provide 
help  to  the  individual  taking  the  tests.     More  often  than  not,  formal 
help  is  given  through  diagnosis  and  placement.     For  example,  when  children 
are  assessed  to  determine  special  education  eligibility,  any  subsequent 
diagnosis  and  placement  decisions  are  ultimately  made  with  the  desired 
outcome  of  helping  the  chi Idren .     While  this  may  appear  on  the  surface 
LO  be  a  decision  involving  selection  (i.e.,   you  select  those  who  are  eligible 
and  deny  those  whc  are  not),   it  is  not.     The  difference  is  that  in  the 
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intervention  process,  after  the  assessment  is  cond'uct^J,  something  active 
occuis  (i.e.,   some  form  of  treatment)  that  is  a  direci.  consequence  of  the 

assessment  proross-     In  selection,    Th^  as   es  smei"  f'  process   stops  with  the 

we  1 1  1: he   l    \l   pre c  1  c  t ed   Lh     s u c c e s  s  oi:  an  i n  t e rv e n  t  i o n   tl"> a  t   is  p  1  a nne d 
with  the  use  oi.   the  test.     As  can  be  seen,    testing  for   intervention  involves 
a  whole  new  set  of  inferences  tnat  must  be  drawn  from  assessment  date*. 
Bias  in  intervention  therefore,  relates  to  whether  or  not  thie  decision- 
makin-^  is  fair  in  helping   jII  those  who  are  assessed,  regardless  of  group 
member  ship . 

Proposed  Alternatives  to  Traditional  Practice 

In  response  to  alleged  bias  In  ^-ts  and/or  in  the  assessment  process, 
a  variety  of  procedures  have  been  proposed  as  alternatives  to  traditional 
practice.     These  alternatives  include  procedural  approaches  that  are 
usually  represented  by  bot'i  test  and  nontest-based  methods.     Some  are 
designed  specifically  to  address  the  issues  raised  in  tl     assessment  bias 

literature.     Others  provide  alternatives  that  have  not  gained  popularity 

i 

in  traditional  test  practice  but  have  been  identified  as  yielding  results 
ttiat  are  less  biased  than  those  procedures  more  commonly  employed  in 
psychoeduca tional  or  psychological  testing.     Some  ar^'well  founded  procedures 
boosting  good  psychometric  properties  while  others,  at  best,  can  be  con- 
sidered experimental.     la  addition,   the  various  alternatives  that  have  been 
proposed  differ  radically  in  the  tyfe  of  data  they  provide  and,  therefore, 
the  purposes  for  which  they  can  be  used.     The  alternatives  reviewed  herein, 
run  the  gamut  from  procedures  that  look  and  act  much  like  those  typically 
employed  in  traditional  practices   to  those  that  are  radical  departures. 
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These  include:     (1)  culture-reduced  testings   (2)  renorming,  (3)  adaptive 


behavior  measures,   (4)  Piagetian  strategies,   (5)  learning  potential 


assessment,    (6)  d^'agnostic  clinical   teaching,    (7)  child  development 


as  sv  s  siu(Mi  t: , 


luciec:   in   Liu-  a^scussiori         b- .'hav i ora  1   assessi'  \xt 


cr  i  ter ion-referenced  tes  t ing . 


Culture-Reduced  Tes  ts .     These  tests  are  sometimes  purported  to  contain 
content   that  is  either  free  of  culture  or  fair  to  individuals  regardless 
of  the  culture  in  w^iich  they  are  a  member.     The  aim  of  culture-reduced 
tests  is  to  include  content  influenced  only  by  environmental  circumstances 
that  are  common  across  cultures.     Cattell's  Culture-Fair  Intelligence 
Test  is  an  example  of  those  included  in  this  type.     Also  included  under 
the  heading  of  culture-reduced  tests  are  nonverbal   tests.     Nonverbal  tests 


tests  are  reported  to  be  less  biased  with  multilingual  and  some  physically 
handicapped   individuals.     Several  of  these   tests  are  nov^  appearing  on  the 
market  in  response  to  charges  of  language  bias  in  testing  while  others 
have  been  available  for  years  for  use  primarily  with  handicapped  populations 
The  Nonverbal  Test  of  Cognitive  Skills  is  an  example  of  a  type  nonverbal 
test  included  in  this  category. 

Renorming .     Renorming  involves  taking  an  already  established  test  and 
providing  new  norms  that  are  more  characteristic  of  the  population  of 
individuals  being  tested  than  the  national  representative  samples  that 
are  most  often  us6d  to  originally  norm  the  test.     This  alternative  is 
most  prominently  embodied  in  the  SOMPA. 


are  those  tests  that  purport 


These 
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Adaptive  Behavior  Measures.  With  a  reconcef tual ization  of  the  meaning 
of  adaptive  behavior  as/evidenced  in  the  1977  revision  of  its  definition  by 
the  American  Asscciation  of  Mental  Deficiency,  measures  of  adaptive  behavior 

nrivp  .,ained       rt      ry.,*n\co  of   poj)ul.irit"y  in  recent  yecjrs.      Within   this  new 

adapcation   to   the  community  ai{s  well  as  the   school   iRescb.Iy,    1  982).  This 

\ 

reconceptualization  was  in  par\^  if-  not  entirely,   the  reeult  of  potential 
bias  in  the  diagnosis  of  mental  retardation  when  assessment  was  conducted, 
as  previously  had  been  done,   by  an  examination  of  adaption  to  the  school 
cu 1 ture^onl y .     The  measurement  of  adaptive  behavior,  by  law,   is  now  a 
necessary  component  of  any  diagnosis  of  mental  retardation.     An  example 
of  a  new  measure  of  adaptive  behavior  designed  specifically  to  address 
the  problem  of  bias  in'  the  diagnosis,  of  mental  retardation  is  the  Adaptive 
Behavior  in  Children  (ABIC)   scales  which  is  part  of   the  SOMPA. 

Piagetian  Tests.     Many  of  the  procedures  used  to  measure  constructs 
employed  in  Piaget's  theory  of  intellectual  development  are  less  than 
more  traditional   tests  of  intelle     ual  functioning.     The  unique  feature 
about  Piagetian  tests  that  make  them  candidates  for  alternative  nonbiased 
procedures  is  in  the  nature  of  the  constructs  they  purport  to  measure. 
Reported  by  those  involved  in  this  area  of  research,   the  constructs  are 
purported  to  be  universal  and  invariant.     Some  evidence  has  been  reported 
regarding  the  similarity  of  cognitive  development,  as  defined  by  these 
measures,  of  children  from  diverse  t:ul tural  backgrounds  (c.f.  De  Avila, 
&  Harassy,  1975). 
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Learning  ^q^nt^^^^  As_se^sment .     Learning  potential  assessment  pro- 
cedures involve  a  te^C-teach-tes t  model  of  assessment   that  differs 
dramatically   from  traditional  measures  that  Sample  behavior  at  one.  poin 

background  ot    the  chi^^i-     ^^nce   this   type  assessment  has  been  develops 
most  extensively  by  F^erstein  (1978)  as  a  component  of  an  intervention 
program,   its   procedures  are    less  standardized  than  normally  found  in 
traditional  tests. 

DiagnoJ^ixj-£Hjj.cal^Teacl^ing ,     Diagnostic-clinical   teaching  is  an 
assessment  P^^ocedure  that  inA>olves   the  actual,  teaching  of  curriculum- 
related  materials  under  conditions  that  maximize  learning.  These 
conditions  can  include  a  variety  of  manipulations  such  as  varying 
reinforcement  and  feedback  conditions.     KratochwiU  et  al.  (1980) 
report   its  f^levance  to  nonbiased  assessment  as  a  consequence  of  its 
focus  on  (l)     tasks   ihat  nearly  all  children  experience  in  the  school 
curriculum  (2)     its  relationship  to  the  interventions  chat  are 

planned  froi^  it. 

Child  gJgyelopment_ObS£iXa tion  (CDO).     Most  closely  associated  with 
Ozer  and  hi5  ^^ssocia'.es  (Ozer,   1966,   1968,   1978),  CDO  is  designed  to 
simulate  the  process  of  learning  on  protocols  th^t  sample  conditions 
under  which  ^  given  child's   learning  problem  may  be  solved.     It   is  a 
nontradi tion^l  form  , of  assessment  and  its  non-normative  approach  makes 
it  an  eligible  candidate  as  a  nonbiased  alternative  to  traditional 
practice. 
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Clinical  Neuropsychological  Assessment,     Clinical  neuropsychological 
assessment  is  concernr(;  with  the  assessment  of  brain-behavior  relations* 
As  such,   it  can  be  concept'ialized  as  a  set  of  procedures  best  interpreted 


iLuence  variaCLons   in  culture.     The  procedures  themselves  depend  on 

standard!,.,  d  behavioral  observations  used   in  conjunction  with  normative 
psycholo^cal  assessment  devices. 

Behavioral  Assessment .     Mos t  ^commonly  associated  with  behavior  therapy 
approaches,  behavioral  assessment  has  been  identified  with  nonbiased 
assessment  since  it   involves  a  set  of  procedures   that  sample  behaviors 
that  are  most  often  referenced  to  an  absolute  standard  of  performance. 
The  sample   is  usually  taken  in  the  natural  environment  and  the  desired 
standard  of  performance-  established  with  either  a  person  responsible  for 
the  individual's  behavior  or  the  individual  h imse 1 f /her sel f . 

Criterion-Referenced  Tests.     While  not  originally  designed  specifically 
as  nonbiased  rr.easures,    the  assumptions  underlying  the  development  of 
criterion-referenced   tests  make  them  candidates  for  such  use.     This  class 
of  measures,  unlike  traditional  norm-referenced  tests,  do  not  depend  on 
comparing  children  in  the  assessment  of  abilities  and  skill-level  achievement. 
Instead,  criterion-referenced  tests  measure- the  extent  tc  which  a  child  has 
mastered  an  absolute  preestablished  standard  of  performance.     These  tests 
are  sometimes  referred  to  as  domain-referenced  tests. 
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Judicial  and  Legislative  Influences 

Ont  by-product  of  the  public's  concerns  over  perceived  bias  in 
psychological  and  t^ducational  assessinent  has  bf^en   the  invol vtniient  cf 


pr,ic  ti.t^  .      tiv)wrvtu",   sin^^t^    m   tho  Aica  ol    jiulici  .^l  actions   tho   influence  of 
rulings    in  one  ,ir'  a  of   tne  application  of   testing  are  felt   in  all  areas, 
the  discussion  will  also  extend   to   those  court  cases  that  have  had  an 
indirect,   yet  significant,    impact  on  assessment  bias   in  education.  This 
is  especially  true  of  those  rulings  on  tests   in  the  area  of  employment. 

The   impact  of   legislative  and  judicial  actions  on  psychological 
testing   in  education  should  not  be  underestimated.     Since  the  mid-1960's, 
a  wealth  of  litigation  and   legislation  has  evolved  that  have  affected  the 
administration,    interpretation  and  use  of  psychological   tests   (Bersoff,  1981). 
Legislative  actions  such  as  the  Civil  Rights  Act  of  1964  and  P.L.  94-142 
are  two  of   the  more  prominent  laws   that  are  presently  reviewed. 

In  the  area  of  judicial  action,    the  courts,  who  have  traditionally 
attempted   to  maintain  a  "hands-off*'  posture  with  respect  to  issues  involving- 
school  policy,  have  recently  jumped  into  the  arena  "with  both  feet".  Hearing 
cases  on  both  statutory  and  constitutional  grounds,   the  courts  have  steadily 
increased  their  involvement  in  the  fair  use  of  psychological   testing  and  will 
apparently  continue  to  do  so  (Bersoff,   1981).     For  the  purposes  of  this 
review,  major  attention  has  been  focused  on  the  Larry  P.  v.  Riles  (1979) 
and  PASE  v.  Hannon  (1980)  cases •     The se  cases  mos  t  directly  impac  t  on  th e 
use  of  intelligence  tests  for  the  diagnosis  and  placement  of  children  in 
"educable  mentally  retarded"  classes. 
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Prof ess ional  Association  Influences 

The  eighth  and  last  major  area  within  the  conceptual   framework  that 
will  be  reviewed,   involves  the  influence  of  professional  associations  on 
the  biased  assessment  practices  o^  its  members.     The   impact  of  professional 
associations  is  usually  felt  through  training  programs,  public  statements, 
published  guidelines  and   their  impact  on  the  certification  and  licensure 
of  those  who  qualify  to  administer  tests  or  provide  professional  services. 
In  this  area  of  the  review  some  of  the  professional  groups   that  hav^  set 
forth  standards  for  assessment  practices  for  its  members  will  be  examined. 

Structure  of  the  Report 

Our  review  of  assessment  bias  is  composed  of  lO  chapters.     Chapter  1, 
the  present  chapter,  was  designed  as  an  introduction  to  the  report.  Its 
purpose  was  to  detail  the  conceptual  framework  that  evolved  as  a  consequence 
of  our  review  and   that  has  provided  the  basic  structure  for  this  report. 
Chapters  2  through  9,   inclusive,   contain  a  discussion  of  the  major  areas 
reviewed.     Chapter  Two  reports  on  the  historical  perspectives  to  bias  in 
assessment  while  the  conceptual  models  of  human  functioning  are  reviewed 
in  Chapter  Three.       Chapter  Four,  Five,   and   Six  report  on  empirical  studies 
in  assessment  bias  in  the  areas  of  technical   test  bias,   situational  bias, 
and  outcome  bias,   respectively.     The  various  alternatives   to  traditional 
test  practice  are  reviewed     ^  Chapter  Seven.     Chapter  Eight  reviews  the 
legisla       e  and  judicial  influences  on  bias  in  testing  and  the  influence 
of  professional  organizations  is  reported  in  Chapter  Nine.     In  Chapter  Tert, 
a  synthesis  of  all  the  major  areas  is  attempted  to  provide  the  reader  various 
perspectives  on  whe^re  we  have  come  in  our  understanding  of  bias   in  assessment 
and,  more  importantly,  where  we  still  have  to  develop  new  areas  of  research 
and  practice. 
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Chapter  2 


Historical   Perspectives  on 
Assessment  Bias 


Since  the  beginning  of  assessment  ef forts ,   ind iv idual s  have  been 
concerned   with  how  fair   the  actual  procedure  or   technique  was  for  chose 
participating   in  it.     In   this  chapter  we  trace  the  development  of 
assessment  over  recorded  history  up  to  the  present.     Although  our 
overview  is  quite   focused   (see  several   sources  for  a  more  detailed 
general   review:     DuBois,    1970;   Doyle,   1974;   Linden  &   Linden,  196B; 
McReynolds,  1975),   we  provide  a  persoective  on  comtemporary  bias   in  the 
assessment  process. 

An  examination  of  the  historical   factors   in  assessment  is 
important   for  several   reasons.     First,    it  is   important  to  understand 
that  many  of  the  contemporary  issues   in  assessment  bias  have  their 
origin   in  past  assessment  practi'^ces.     Second,   it   is   important  to 
realize  that  many  contemporary  issues  are  related   to  social  or  even 
political  concerns  that  have  their  origin   in  the  past.     Third,   the  past 
has  sometimes  provided  or  even  imposed  a  structure  on  assessment 
practices.      It   is   important  to  understand  this  structure   in  order  to 
identify  contemporary  model s  of  assessment  practice.     Finally,   it  is 
important   to  focus  on  historical  factors  to  introduce  a  variety  of 
scholarly  perspectives  into  the  discussion  of  the  issues  surrounding 
bias  in  assessment. 
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Aociept  Tnfluec<^s 
One  of  the  most  extensive  and  scholar]^'  discussions  of  the 
historical  anjtecedents  of  assessment   in  qeneral   ^nd  i5ersonality 
assessment  specifically,   has  been  presented  by  McReynolds   (1971).  Most 
historical   treatments  of  the  assessment  literature  typically  beqin  with 
a  discussion  of  the  work  of   "Walton   in   EnglanjJ  and  Catell    in  the  United 
States   [i.e.,  many  books  on  assessment  begin  with  this  period  (e.g., 
Sundberg ,    1977)]    and  historical   tables  reflect  this  perspective. 
However,  assessment  has  a  much  richer  history,   attesting  to  the 
assumption  that  many^^^ya tures  of  contemporary  assessment  actually  date 
Dack  to  the  beginnings  of  recorded  history.     McReynolds   (1974)  traced 
the  historical   antecedents  of  the  current  practices   in  assessment 
beginning  with  antiquity  and  extending  to  the  latter  part  of  the  last 
century.     Four  phases  are  reviewed^^  namely,   antiquity,   the  medieval 
period  and  the  Renaissance,   the  Age  of  Reason,   and  the  period  from 
Thomesius  to  Gal  ton. 
Ant  iqui  ty 

An  examination  of  early  assessment  practices  shows  that  there  was 
a  close   interplay  between  the  methods  employed  and  the  cultural  views 
held  during  that  particular  time.     This  is  not  unlike  the  contemporary 
views   in  the   United   States  that  led   to  the  development  of  PL-94-142 
with   its  emphasis  on  fair  assessment  pratipes  for  handicapped  chi^ldren. 
It   is  possible  that  the  first  personality  assessment  procedure  was 
based  on  astrology,  and  that  the  first  psychological  "test"  was  the 
horoscope.     Although  astrology  can  be  regarded  as  invalid  on  scientific 
grounds  (   and  possibly  a  biased  assessment  procedure) ,   it  did 
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contribute        (a)    the  view  that   individual   personalities  repres^rt  the 
focus  of  assj- :ic;ni  en  t  r    ( b)    the  psychological  make-up  of  the  indiviT:^2»]  is 
predetermine::^  and   (c)    the  development  of  t^xonomical  categories. 

Another  t^^tly  assessment  strategy  involved  physiognomy,  t-^ 
inter prctat '   of  an   individual's  character   from  body  physique. 
Ph  ys  iognom  ict  ^  also  a  very  limited  assessment  procedure,   assune<:  5 
relatively  f:/^?d  conception  of  personality,  but  shares  some 
methodolog  ico^ ,    featutjes  with  contemporary  naturalistic  observati '  -  ^  as 
represented   i%  behavior  modification  procedures   (see  discussicr^ 
Chapter  7),     ^^rPeynolds   (  1974  )   noted  v. .at  the   longest  continues 
assessment  ttr^rmique  with  some  claim  to  rationality  and  one  thar 
remains  with        today  is  physiognomy.     Thus,  recent  work  such  ^s  ^-at 
by  Mahl    (195C;   and  Gleser,   Gottschalk,   and   Springer   (1961)    on  sr^ee-cn 
patterns;   by       1  1    (1959),   Eibl-Eibesfeld  t   (1971)    and  Haas  (19''2 
ethology  of  -ovements  ;  of   Izard   (1971)    and   Ekman  and  associates  '?-c-ran 
1973;   Ekman,  I'^riesen,  &   Ellsworth,   1972)   on  emotions  and  facial 
expressions;  -md  Hess  and  associates   (Hess  &  Polt,   1960;   Hess/  Seltzer, 
&   Schlien^  l'K/0)   on  the  relation  of  pupil   size  to  affect,  can  be 
related  to  e.irlier  physiognomic  conceptions  (cf.   McReynolds,  19^4; 

Develor^ne.ntS' in  assessment  during  early  times  were  not  always 
limited  to  tht,  area  of  "personality  assesment"*     For  example,  Civ:l 
Service  examinations  were  used  in  ancient  China  for  selection  pjrcoses. 
DuBois   (  1966)  notes: 

The  pari  lest'  development  seems  to  have  been  a 
rutllmentary  form  of  proficiency  testing.  About 
the  year  2200  B.C.  the  emperor  of  China  is  said 
to  hnve  examined  his  officials  every  third  year... 
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A  thousand  years  later   in  1115  B.C.,   at  the  beqinninq 
of  the  Chan  dynasty,   formal  examininq  procedures 

were  es tab 1 i shed •     Here  the  tecord   is  clear  •  Job 
sample  tests  were  nsed  requ ir ing  prof ic iency  in 
the  five  basic  arts:  music,  archery,  horsemansl  ip, 
writing,  and  ''arithmetic         Knowledge  of  a  sixth  act  was 
also  required  -   skill   in  the  rites  and  ceremonies 
of  public  and  social   life  (pp.   3(?-31)  • 
Medieval  Period  and  the -Renaissance 

McReynolds   (1974)    notes  that  during  this  period,   the  acceptance  of 
humoral  psychology  and  physiognomic  strategies  of  evaluating  people  was 
widespread.     Generally,  this  period  supported  the  recognition  of  the 
individual  and  so  we  again  see  an  example  of  cultural   influences  on 
assessment  pract ice . 

Age  of  Reason 

The  Age  of  Reason  covers  the  period  from  approximately  the  middle 
of  the  sixt^nth  century  to  the  latter  part  of  the  eighteenth.     A  major 
theme  of  this  period  was  the  focus  on  individual  differences  as 
reflected   in  some   important  works  on  assessment — Huarte*  s  Tryal^of 
Wits,  Wright's  Passions. of  . the ^Minde ,and  Thomesius*   New ^Discovery. 
During  this  period,  the  recognition  of  individual  differences  prompted 
measurement  so  that  an  individual's  happiness  could  be  more  fully 
real i  zed . 

Eroro  Thomesius  to  Galton 

A  significant  contribution  to  assejssment  dur ing  this  period, 
particularly  in  the  nineteenth  century,  was  phrenology.  Phrenology 
bears  similarity  to  physiognomy — While  physiognomy -emphasi zed  - 
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assessment         external  body  features  such  as  facial  and  other 
characteristics^  phrenoloqy  emphasized   the  assessment  of  the  external 
formations  of  the  skull.     However,  phrenology  assumed  that  mental 
functions  were  based  on  specific  processes  localized   in  certain  areas 
of  the  brain  and  that  the   intensity  or  magnitude  of  these  functions  was 
indicated   in  the  contours  and  external   topography  of  the  skull 
(McReynolds,    1974  ). 

Four  positive  contributions  of  phrenology  that  have  a  resemblance 
to  contemporary  assessment  practices  or  activities  can.be  identified 
(McReynolds,   1974  ).     First,  there  was  an  emphasis  on  individual 
differences.     Second,   the  assessment  paradigm  emphasized  the  notions  of 
assessor  and  subject,   the  systematic  collections  of  data  during  a 
single  session,   and  written  reports  which  usually  included  qualitative 
profiles.     Third,   the  Phrenological  movement  helped  advance 
"objectivity"  through  "blind  assessment"  and  rating  scales.  Finally, 
phrenology  contributed  to  the  development  of  a  primitive  taxonomical 
system  such  as  affective  faculties   (e  .g  .  ,   propensi ties ,   sentiments)  and 
intellectual   faculties   (e.g.,  perceptive,  reflective). 

Iropl ica tions 

This  brief  historical  overview  of  ancient  influences  points  out 

V 

that  many  contemporary  assessment  practices  have  their  roots  deep  in 

our  past.     Noteworthy  is  the  fact  that  the  work  of  the  phrenologists, 
(and  later   Quetelet' s  work  on  psychological  statistics)    set  the  stage ^ 

for  the  emergence  of  Galton's  contributions  and  the  more  modern  era  in- 
assessment.     It  is  interesting  to  speculate  how  som^  of  the  ancient 
procedures  might  have  been  perceived  as  biased  or  discriminatory. 
McReynoIds  (1974)    raises  as  interesting  point: 
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We  kn.vw  that  such  techniques  as  chiromancyr  metaposcopyr 
and  phrenology  are  in  principle  all  totally '*lnval  id  r 
yet  I  suggest  that  in  the  hands  of  insightful  and 
discernii.q  practitioners  they  mayr  at  least  on  occasion, 

have  been  more  valid  than  we  suppose,  evon  if  for  different 
reasons  than  their  users,  much  less  their  clients, 
imagined    (op.  524-525). 

bljpQteenth -.Century 

During  the  nineteenth  century  significant  developments,  were  taking 
place  in  Western  Europe  and  the  United  States  that  would  shape  the 
future  of  psychological  and  educational  assessment  (cf.     Carroll,  1978; 
Laosa,  1977;   Dubois,  1970).     Specifically,  events  were  occurring  in 
France,   Germany,   England,  and  the  United  States  that  were  to  have  a 
profound  influence  on  assessment  practices  in  psychology  and  educational 

Erance 

Attention  to  two  movements  occurred  in  Francj^  that  made  a   

significant  impact  on  the  histdry  of  testing  and  assessment   (Maloney  & 
Ward,  1976).     One  movement,  pioneered  by  Bernheim,  Liebauit,  Charcot, 
and  Freud ,  was  focused  on  a  new  view  of  deviant  behavior •  The 
influence  of  this  movement  was  to  take  abnormal  behavior  out  of  the 
legal  or  moral  realm  with  which  it  had  been  previously  associated  and 
cast   it  as  a  psycho  log  ica 1  or  psychosocial  problem •     Thi  s  prompted 
psychological  assessment  rather  than  moral  or  legal  sanction,  as  had. 
been  common  prior  to  this  period.  l 
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Also  noteworthy  was  tiie  movement  called  "The   Science  of 
Education."  Jacques   Ttard,   a   French  physician,   tauq'nt  Victor,   the  "Wild 
Boy  of   Aveqron"  various  skills.     Many  of  the  procedures  used   in   Itard' s 
work  were  similar  to  more  contemporary  behavior  modification  procedures 
which  emphasize  environmental   stimulus  and  response  changes  during 
instruction.      ltard*s  contributions  also  provited  a  background  for 
Binet's  work  on  measurement  of  intelligence. 

Esquivol '  s   {  1722-184(^)   work,   represented   in  his  book  Des  Maladies 
Mentales  was   influential    in  that  he  d  i  st  ^,  nq  ui  shed  between  "emotional 
disorders"   and  "subaveraqe  intellect,"     Accordinq  to  his  viev/s, 
subaveraqe   intelliqence  consisted  of  levels  of   individual  performance: 
(a)      those  makinq  cries  only,    ( b)    those  using  monosyllables,  and  (c) 
those  using  short  phrases^  but  not  elaborate  speech.     Thus,  here  we  see 
the  basis  for  an  early  classification  scheme  that  could  organize  human 
behav  ior  . 

Gerroany 

While  some  of  the  work   in   P'rance  emphasized   individual  differences 
in  pathology  and  cognitive  ability,   German  scientists  perceived 
individual  differences  as  a  source  of  measurement  error.     A  significant 
contribution  to  the   individual  differences  theme  is   found   in  t^ 
"Maskelyne-Ki  nnebrook  affai^r."     The  difference  between  Maskelyne  (the 
astronomer)    and   Kinnebrook   (the  assistant)    in  their  measurement  of  the 
timing  of  stellar  transits  was  later  analyzed  by  Bessel.  Bessel 
concluded  that  different  persons  ha<^  different  transit  tracking  times, 

i 

and  that  when  all  astronomers  were  checked  against  one  standard, 
individual  error  could  be  calculated — a  sort  of  "personal  equation"  was 
developed   (cf.  Boring,  1950). 
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Another  siqnificant   influence  on  assossmt^nt  came  from  Wundt  who 

set  lip  a   nsycholo^i  ical    laborai.ory  in   Leipzig;   to  study  such  processes  as 

reaction  time,   sensation,  psyc hophysi cs ,  and  association.     This  work, 
as  well  a.    the  general  work  occurring  on  measurement  was  helpful  to 
populari7e  the  notion  of  measu.^^ment  of  differences  between 
individuals.     Some   Americans  who  studied  with  Wundt  were  G.  Stanley 
Hall  and  James  McKeen  Cattell.     Both  of  these  individuals  were  to  have 
a   larqe   impact  on  future  psychological  assessment. 
Kngl anc 

Tie  wo r K  of  Charles  Darwin  was  most   influential   in  psychological 
and  educati'  iial  assessment  particularly  in  his  theory  of  evolution 
presented   in  in  Origin  of  -the  -  Species .     Darwin's  work  emphasized  ' 

that  there  are  measurable  and  meaningful  differences  among  members  of 
each  species.     Galton,   Darwin's  ha  1 f~ cousi n ,  was  most   influential  in 
applying  evolutionary  theory  to  humans.      In  his  book.  Hereditary -Genius 
(1869),  he  a:^^'ued   that  "genius"   had  a  tendency  to  run  in  families. 
Gallon  was   :reatl/  influenced  by  the  Balgian  statistician  Ouetelet 
(V'^^id-ISSA)   who  was  the  first  to  apply  the  normal  probability  curve  of 
Lapxac  .    tnd   Gauss  to  human  data.     This  translated   into  the  notion  of 
"I'hommv^  mayen"  or   the  notion  of  an  "average  man"    (Boring,  195C^).  In 
this  view,   nature's  mistakes  were  represented  as  deviations  from  the 
average . 

Several   implications  of  this  work  are  noteworthy.     First,  Galton's 
system  of  classification  represented  a  fundamental  step  toward  the 
concept  of  standardized  scores   (Weisman,  1967).     Second,   in  the 
application  of  Quetelet's  statistics,   Galton  demonstrated  that  many 
human  variables,  both  physical  and  psychological  .  were  distributed 
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normally.     This   is  a  direcbt  precursor   to   the  concept  of  a  norm  and 
application  of  standard i zn t ion   (Laosa,    1977).     Third,   a  maior  influence 
of  this  work  was   to  establish  that  certain  var    ibles  should  be 
subjected   to  quantitative  measurement,     Galton's  work  was  siqnificani: 
in  that   it  encouraged  other  efforts   in  the  area  of  measurement  of 
individUcjl  differences   in  tnentdl   abilities   that  was  cons  ider  ah  1  y  more 
sophisticated   than  previous  efforts   (Cooley  &   Lohnes ,    1976).  Finally, 
through  the  apolication  of   the  normal  curve,   individual  performance  or 
standing  could  be  classified  as  deviant  or  even  as  a  mistake  of  nature. 
We   know  that  although  Galton  was   influenced  by  the  phrenologists,  he 
rejected   this   form  of  assessment.      He  noted   in   1906,  "Why  capaL)le 
observers  should   have  come  to  such  strange  conclusions   (can)  be 
accounted   for      .most  easily  on  the  supposition     of  unconscious  bias^  in 
collecting  data"    (Quoted   in   Pearson,   193(1,   Vol    Illb,  p,  bll). 
United -States 

Early  work  in   the  United   States  cont r ibuted   to  what  was  called  the 
"Mental   Testing"  movement.     Cattell    (1860-1944)    was  the   first   to  use 
the  term  "mental   test"   and  he   is  generally  referred   to  as  the  father  of 
mental   testing    (DuBois,   1970;   Hunt,   1961).     Cattell  also  introduced 
experimental  psychology  into  the  United  States.     A  significant 
contribution  to  assessment  was  that  he  advocated  testing   in  schools;  he 
was  also  generally  responsible  for   instigating  mental   testing  in 
America   (Boring,  1950). 

In   1895  Cattell  chaired   the  first  American  Psychological 
Association  Committee  on  Mental   and   Physical   Tests.       Although  Cattell 
made  major  changes  in  the  nature  of  testing,  his  work  was  not  accepted 
unconditionally.     For  example^   Sharp  (1899)   published  an  article 
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quest.  1  oiii  rv]   t.h(*  r  i  i  i  ttt)  i  1  i  t  y  of   mt'ntal    t(*st5>.     Wissler    (I'HU)  vX)mpcire(J 
the   r    1  i  <ib  1  i  i  t  y  at    r.omo  of    (\\ttoll*5i   f>s  yc-ho  I  or|  ica  I   mo.^sures  with 
Vcirious  (ih'asurinq   appr  o.k -hc^s    »  r  om   t  ho  f)hysical    fu^t.^ncos  anri  conrliKlod 
that   t^  sts  used    vn  Cattell's   li\b  sli^wed   little  correlation  amonq 
themselves,  did  not   relate   to  academic  qrades,   and  were  unreliable  (cf. 
Naloriey       Wa;'i,    l')VO),      Iz/en   WunJl   v/a::.   iiwl    lUi  ppo  r  t  i  \/ of   (Tat  toll's 
focus  on  niontal  me:'ar,ur ernen ts    (Rorinq,   105^1).     Nevertheless,  Cattell's 
work,   as  well   as  other  work   in   France,   promoted   the  development  of  a 
movement  called  differential  psychology. 

Differential  Psychology 
Applications   in  -Education 

Around   the  turn  of  the  century,   assessment  was  again  given  a  new 
impetus  through  the  development  of  differential  psychology  (Binet  & 
Henri,   1895;   Storn,   1900,   1914).     Stern   (19  14)   suggested  that  mental 
age  be  divided  by  chronological   aqe  to  produce  a   "mental  quotient,"  a 
procedure,  with  refinements,   that  has  evolved   into  the   IQ  of  today 
(Lasoa ,    1977)  . 

The  work  of  Binet  and  his  associates  was  quite  influentisl, 
although  not  necessarily   in  the  direction  that  Binet  had  envisioned  or 
desired    (cf.   Sarason,   1976;  Wolf,   1973).     Binet   initially  focused  his 
efforts  on  the  diagnosis  of  men  ally  retarded  chidren  around  the  late 
1880' s.     At  this  time  he  was  assisted  by  Theodore  Simon,  who  he  later 
worked  with  in  the  development  of  the  first  formal  measure  of 
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intc'I  I  (H-t  U.I !    .iss<'f;siTU'nt    tot    c-luldre^n    (Wolf,    ! /  U  •      l^a^UMl   on  .1  5»tiJdy 
conMurt^'d    for    t  h(»   Mirn.srry  of    Public^    r  ns  t  r  uc^  t  i  on  ,    hi*    fOfMJsrd   r»ffort5i  on 

pmlictmq    which  rinld    w\)nl(i    hi*   Muhlt'   to   siii^c-tMHl    m    school  (Kfsnick, 
1982).      iunet   noted   t  hcit   performance  on   his  scale  had    implications  for 
classification  and  education.   Resnick   (19R2)  notes: 

A    S(  of    thirty    '|uost  ious    Wr!s   d4»ve  1  of)>'d  ,  tswli 

of   increasinc]  difficulty.      Idiots  were   those  who 
could   not  go  beyond   the  sixth   item^   and  imbeciles 
were  stymied   after   the  twelfth.     Morons  were  found 
able   to  deal   with  the   first  twenty-three  questions. 
They  were  able   to  do   the  memory  tests  and   arranqe  lines 
and   weights   in  a   series,  but   no  more...  the  test.,  was 

designed  as  an  examination  to   remove  from  the  mainstream 
of   schooling,   and   place   in  newly  developed  special  classes 

for   the  retarded,   those  who  would   be  unable  to   follow  the 
^  normal   prescribed  curriculum.     As  such,    it  was  a  test  for 

selection,   removing   from  normal   instruction  those  with  the 
lowest   level   of  ability.     Binet  argued,   however,   that  the 
treatment  the  children  would  receive  in   the  special  classes 
would  be  more  suited  to   their   learning  needs.   The  testing, 
therefore,  was  to  promote  more  effective  and  appropriate 
instruction  (p.  176). 
Around  the  turn  of  the  century,    interest   in   testing  the  abilities 
of  children  was  at  a  high  level.     This  was  prompted^    in,, part,  by  the 
growing  pr^^uiation  of  children   in  schools  due  to  natural  population, 
growth  ari«i   Amin J.grat ion   (Trow/   1966)^  and  the  fact  that  students  began 
to  stay  in  school  longer   (Chapman^  1979).     With  the  growing  number  of 
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r  h  I  1  J  r  .  M)    1(1    :»([i<)()  1  <;  ,    m    1>im  Mnu-   c  I  ,  ,i  r    that       ^  t    all    children   (  o  i:  I  d  profit 
fty*     .Minlar     1  nst  r  ih*  t  1  (Ml  .      A    S»'n.it»^   ( V  )iT)in  i  t  t  «  m  ^    ri'P<Mt«Mi    i  ri    PUH|  that 
apMr  ox  i!i  I  t  «d  y    /?l    of    all    forojMn   horn   f)uhli(-    school    i^tinh^nts  N*»w 
Yat\    (and    in  many  othor   cities    it    w.»  s  (-lose   to   S(^V.    (Tyark,    I  M  /4  )  |  could 
profit    from  .special  instructions. 

ncv(^ral    Aincricar^    f)-,  yc  ho  1  o.m  s  t oroitiotci    Min«^t'?.   work.  For 
oxciruf^lc,    fkMiry   Coddard  puhlish^ci   the   first    r(»visiori  of   the   F^in**t  scale 
and   7'erman  developed   the   St  an  ford^B  i  net  •     Thereof  ter   the   Hi  net  scale 
was  use   to   identify  children  who  were  reqarded  as   "backwards"  or 
"feeblomind(v]"  .     Wallin    (1914)    reported   that    in   1911    the   Pinet  was 
beinq  used    in  71    of    R4   cities   that   administered   tests   to  identify 
"  feeblemind-i"   c-hUdren.     ffowever  ,    the   FUnet   scale  was  also  beinq  used 
experimentally  to  screen  out  and   turn  back   retarded    imrr  grants  (Knox, 
1914,   cited    in  Wiqdor  &  Garner,  19R2), 

The  Stanford  version  of   the   Bi ne t-S imon   Scale  was  originally 
published   in   1^16   by  Terman  and   this   scale  was   revised  by  Terman  and 
Merrill    in    1  937   and    196(?  and   renormed   in   1972.      This   translation  and 
revision  of  Binet's  earlier  work  firmly  established  intelligence 
testing   in  schools  and  clinics  throughout   the   United   States  (DuBois, 
1970),     It    is  possible  that  work  building  on  these  develoments  led 
directly  to  many  of   the   issues  surrounding  bias   in  assessment  practices 
in  psychology  and  education   today.      Sarason   (1976)  notes: 

Schoo'.   psychology  was  horn   in  the  prison  of  a   test  and 
although  the  cell   has  been  enlarged  somewhat,    it   is  still 
a  prison.     Alfred   Binet  would  have  been  aghast,    I  think,  to 
find   that  he  gave  impetus   to  a  role  which  became  technical  and ^ 
narrow,  a  role  in  which  one  came  up  with  analyses,  numbers. 
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.ijul   (M  <iss  I  f  ir<it  KHis   ..^hic-Ki  h<i<l    littlp   or    no   bpa'-inr?  on  whrit 


h.ippiMUvl    to  childror^    i      t  hi»  c- 1  tiMsroofn .      Of    rourso,    it  makos 


i  I  f  f  t»rt>n(i>    \t  ,   on   thr»  h.^sis  of    t«»r>tin<i,       rhihl    is  put  in 


J  special   class  of  some  kind  and   we  certainly  have  a  variety 


of  typer,--but  even  here  Binet  would  probably  have  nsked  what 


t)***irinu  h,i(i   t        child's   pe  r  f  o  rm.ini      on   thr  sp^'cific 


oduc/it  iona  I    o\ ,\u  which  ho   rf,*nuirofi  (p. 


)H7)  . 


Developinent  of  (^roup  Testing 


The  assessment  movement  was  qiven  a  major   thrust  through  the 
development   of  qroup  tests  durinq   Worln   War    I    (WWI).      Many  assessment 
efforts  (hjr  in  i   ^his   time  reflected   a   pattern  of   procedures  similar  to 
that   us^^vl  by   [unet    (Newland,    1977).      Fbbinqhaus  demonstrated  the 
feasibility  of  group  tests  and  some  American  psychologists  (e.g., 
Whipple,    1910;   Otis,   1918)    recognized  that  the   Binet-Simon   Scale  coukJ 
be  adanfed   for  group  testing.     However,   there  were  important 
differences.     Whereat  the   Binet-type  items  typically  required  a 
definite  answer  provided  by  the  child,  group  tests  usually  called  for 
recognition  of  a  correct  answer  among  several  alternatives  (Carroli, 
1978)  . 

A  committee  of  the  American  Psychological   Association,  chaired  by 
Robert  M.    Yerkes,  developed  the  Army  Alph^  and   Army  Beta  group  tests. 
The  Army  Beta   (   a  nonverbal  group  test)    was  designed  so  as  not  to 
discriminate  against  illiterates  and   individuals  speaking  foreign 
languages.     While  the  impact  of  this  development  was  to  create  a  new 
interest  and  role  in  testing,  a  review  of  tests  used   (cf.   Yerkes,  2921) 
reveals  the  source  of  many  tests  were  increasingly  used  for 
non-military  purposes   (Newland ,   1977).  / 
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Following  the  war,  many  psycholc       ts  who  were   involved   in  wartime 

1  n  V  o   V  ed    i  I .    t  i t-    s  c: ;  K/  o  :.  .J ..      :■'  \  '  si)  i  c  K    (  i    ^  ■ . '  ■    :  i  > ;  i  •„  . 

Aldinq   this  movement  was   Philander   P,   Clarxton,   U.S.  Commissioner 
of  education,  who  circularized  school   super?  .endents  throughout 
the  country  about  the  resetve  of  trained  people  that  could  be 
tapped   for  the  needs  of  the  . schools.     He  wrote  enthusiastically 
about  the  "unusual  opportunity  for  city  schools  to  obtain  the 
services  of  competent  men..."     Mong  the  services  they  could 
render  was  "discovering  defective  children  and  children  of 
superior  intelligence..."    (p.  183). 
This  movement,   in  part,   facilitated  the  use  of  group  intelligence  tests 
in  the  public  schools.     Many  of  these  tests  were  administered  to 
identify  children  who  could  not  profit  from  regular  instruction. 
Although  some  schools  had  made  provisions  for  special  children  (Wallin, 
1914),   the  in tel 1 igence  tests  served  a  role  to  fcrmalize  the  decision 
making  process  for  these  special  services.     Also,  between  1919  and 
1923,   Terman  introduced  the  Niti.onal   Intelligence  Test  for  grades  three 
to  eight,  and  the  Terman  Group  Test,  for  grades  seven  to  twelve  and 
found  that  the  schools  w^re  most  receptive   (Resnick    ,   1982) .  Resnick 
(1982)    reports  that  the  most  important  use  of  the  tests  was  for 
placement  of  children   in  homogeneous  groups:  ^ 

Sixty-four  percent  of  the  reporting  cities  used  group  iritelli- 
gence  tests  for  this  ourpose  in  elementary  schools,   56  percent  in 
junior  high  schools,  and  41  percent  in  high  schools.  Enthusiasm 
for  the  use  of  testing  systemwide  for  this  purpose  was  at  a  high 

level.     In  1923,   Terman* s  group  test  for  grades  seven  to  thirteen 
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sold  mor^  than  a  half-million  copies  (pp.  184-185). 

;  .  •-    •  -^         ?  ^  •       '■{'•■■-It;;-:  '  i  '  '       i  .    r  ;  .  j    •    l^'  ■      ^  ..  ■  !  i  •    • .  i   ^.  ,  ;  i  =  ,  .  U      -  .  '  ^  ■ 

developed  an  -  ccorate  theory  of  the  orq  an  i  ..a  1 1  on  of  human  abilities  in 
which  he  cone,  uced  that  all   intellectual  abilities  have  a  common 

factor^  q,   an-:   ^  n'jmber  of  specific  factors,         which  relate  uniquely 
to  each  pres^-rc   ability.     Spearman's  two-factor  theory  was  the  basis 
upon  which  t^?nr^  examining  specific  abilities   (Edwards,  1971)  rather 
than  global   ^^^^--res  were  developed    (Laosa,  1977). 

Thorndi<^'   viewed  intelligence  as  comprised  of  a  multitude  of 
separate  el     -r'-^^s  ^  each  of  which  repres     ted  a  specific  ability. 
Intelligence  w-^  5  also  perceived  as  hav  i    j  both  hereditary  and 
environmental   'zor  c^onents .     Thurstone  concluded  that  there  were  seven 
primary  ment-:)*    <;oilities  (in  contrast  to  Spreaman's  s,  factors)  and 
developed   tho   ^ri--^:ary  Mental   Abilities   Test  to  measure  each  specific 
abi 1 i  ty . 

In  tell  iq  ^r^ice  tests  gradually  evolved  into  major  diagnostic 
instruments  thrcjohout  the  world.     Such  instruments  became  a  major 
diagnostic  tool   for  identifying  the  retarded   for  psycho-educational 
research  and  ?j<rrvice  (cf.  UNESCO,   1960).     However,   not  all  countries 
accepted  thei  t   use :     In  the  So  v  iet  Up  ion  such  tests  *,'ere  banned  in  19  36 
by  the  Communist   Party  because  they  were  considered  methods  which 
d  iscr iminate'-l  aqainst  the  peasants  and  the  working  class  in  favor  of 
the  culturally  advantaged    (Sundberg,   1977;  Wortis,   1960).     As  an 
alternative,  '^ii^cnosis  was  based  primarily  on  neuro-physiolog ical 
evidence.     Th<?  neurologist  and  psycho-physiologist,  rather  than 
clinical  psychologist,  were  primarily  engaged  in  diagnosing  the 
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mentally  retarded   (cf.   Dunn  &  Kirk,  1963). 

^'ork    in   thesn   arr^r:.    as        "1    as  ^  M'ler   corri  r  Ibut  i  on--;  O!  ior    to  and 

and   Its  assessment.     A  major   contribution   to   the   testing  movement  was 

the  development  of  the  Wechsler   Scales.     Wechsler  developed  the 

Wechsler  Adult  Intelligence  Scale   (WAIS)   by  including  a  group  of  . 

sub-tests  from  WV/I  vintage  which  were  found  valuable   in  his  work  with 

If 

adults.     His  criterion  of  "general   adaptability"    (cf.  Wechsler,  1975) 
was  extended  downward   in  the  development  of  the  Wechsler  Intelligence 
Scale  for  Children   (WISC  and  WISC-R)   and   the  Wechsler   Pre-School  and 
Primary  Scale  of   Intelligence  (WPPSI).     The  work  of  Wechsler  contrasted 
with  that  of  Binet.     Whereas  Wechsler' s  Scales  emerged   from  work  with 
adults  and  were  later  developed   for  use  with  children,   Binet's  emerged 
from  work  with  young  children  and  later  was  developed  for  use  with 
older  children   (Newland,   1977).     This  has  led  to  an  important 
differentiation  that  has  implications   for  assessment: 

The  perception  of  tested   intelligence   in  adults  today 
has  hampered  and  diluted  the  perception  of  tested 
learning  aptitude   in  children.     And  yet,   in  spite  of 
the  fact  that  so  many  different  measures  are  objectively 
obtained  on  children,  such  results  are  used   in  research 
along  with  tho,se  obtained  otherwise  on  adults  as  though' 
they  were   interchangeable   (Newland,   1977,  p.  6). 
Newland   (1977)    suggests  that  "learning  aptitude"    (in  the  sense  of 
school   learning  aptitude)    is  a  much  better  criterion  for  "child 
intelligence"   than  the  adult  connotation  of  mul ti- faceted 
susceptibility  of  adaptation  or  potential  of  adults. 
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Political  Aspects  -of   the  A^sse^srae^nJ:  . Mouernent 

The  testing  mover.ent  has  not  been  confined  to  issues  bearinc  on 
the  psychometric  features  of  tests  themselves.     Test  results  and  data 
from  testing  research  have  been  used  for  political  or  even  racial 
positions.     Many  Europeaa  and  American  scientists  (anthropologists, 
biologists,  and  psychologists)    have  held  racial  positions  (Chase, 
1977),  and  this  has  been  documented  specifically  with  testing  the  10  of 
individuals  (Block  &  Dworkin ,   1976?   Eckberg ,   1979;  Gould,   1978;  Kamin, 
1974). 

Many  psychologists  interpreted  the  intelligence  test  data  from  W\VI 
as  evidence  for  genetic  differences  among  races  and  witjiin  the 
Caucasian  race,  among  different  nationality  groupings  (e.g.,  Brigham, 
1930).     However,  some  of  the  interpretations  were  later  retracted 
(e.g.,   Brigham,  193(^).     Indeed,  the  notion  that  intelligence  or 
scholastic  aptitude  reflected  largely  the  effects  of  native  endowment 
in  interaction  with  schooling  was  generally  slow  in  development  (cf. 
Carroll,   1978,  e.g.,   Peterspn 1925)  . 

Nevertheless,  a  variety  of  oppressive  positions  by  "respected" 
individuals  were  presented  during  the  history  of  testing,  as  these 
statements  indicate: 

^*      (W)  e  are  incorporating  the  negro  intq  our  racial  stock,  while 
all  of   Europe  is  comparatively  free  from  this  taint. ..the  steps 
that  should  be  taken... must  be  of  course  be  dictated  by  science 
and  not  by  political  expediency .• .the  really  important  steps  are 
those  looking  toward  the  prevent  ions  of  the  continued  propagation 
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of  defective  strains  in  the  present  population  (Brigham^  1923) 

^  ^-  .^M-^        :ian   cr)  I  l<i  rei; '  s]    duJJness   seems   to   be  ■ 

or  at  least  inherent   in  the  family  stocks  from  which  they  cc 
The  fact  that  one  meets  this  type  with  such  extraordinary 
frequency  among   Indians^   Mexicans  and  negroes  suggests  quite 
forcibly  that  the  whole  question  of  racial  differences  in  mental 
traits  will  have  to  be  taken  up  anew... there  will  be  discovered 
enormously  significant  racial  differences  which  cannot  be  wiped 
out  by  any  scherre  of  mental  culture. 

Ch i Idren  of  this  group  should  be  seg r eg a ted   in  special 
classes .. .they  cannot  master  abstractions^  but  they  can  often 
be  made  efficient  worker s ...  There  is  no  possibility  at  present  of 
convincing  society  that  they  should  not  be  allowed  to 
reproduce. . .thQy  constitute  a  grave  problem  because  of  their 
unusually  prolific  breeding    (Terman^   1916^  p.  6). 

b^ow  the  fact  is^  tba t-.workroap  may  have  a  ten  year  intelligence 
while  you  have  a  twenty.     To  demand  for  h  im  a  home  as  you  enjoy  is 
as  absurd  as  it  would  be  to  insist  that  every  laborer  should 
receive  a  graduate  fellowship.     How  can  there  be  such  a  thing  as 
-^social  equality  with  this  wide  range  of  mental  capacity? 
...The  man  of  intelligence  has  spent  his  money  wisely,  has  saved 
until  he  has  enough  to  provide  for  his  needs  in  case  of  sickness, 
while  the  min  of  low  intelligence,  no  matter  how  much  money  he 
would  have  earned,  would  have  spent  much  of  it  foolishly. .Dur ing 
the  past  y^ar ,  the  coal  miners  in  certain  parts  of  the  country 
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have  earned  more  money  than  the  operators  and  yet  today  when  the 
mines  shut  dovni  for  a  time,  those  people  are  the  first  to  suffer. 

them  that  mining   is  an  irregular   thing  ar>_  that... they  should 
save.  . . . (Goddatd ,  1920,   p.  8) 

Never  should  such  a  diagnosis  (of  feeblemindedness!    be  made  on  the 
10  alone. ...We  must  inquire  further  into  the  subject's  economic 
history.     Wl>at  is  his  occupation;  his  pay.. ..We  must  learn  what  we 
can  about  his  immediate  famaly.     What  is  the  economic  status  or 
occupation  of  the  parents?  ..  .>/hen ..  .this  information  has  been 
col  lected  . . .  the  psychologist  may  be,  of  great  value  in  getting  the 
subject  into  the  most  "suitable  place  in  society ...  (Yerkes  ^  1923^ 
p.  8) 

Goddard  reported  that^  based  upon  his  examination  of  the  "great 
mass  of  average   immigrants 83%  of  Jews^   80%  of  Hungarians^  79% 
of  Italiansr  and  87%  of  Russians  were  "feebleminded"  (Goddard^ 
1913)    (in  Kamin,   1975,  p.  319) 

) 

That  part  of  the  law  which  has  to  do  with  the  nonquota  immigrants 
should  be  modif ied . . . . All  mental  testing  upon  children,  of 
Spanish-American  descent  has  shown  that  the  average  intelligence 
ot  this  group  is  even  lower  than  the  average  intelligence  of  the 
Portuguese  and  Negro  chi Idren ...  in  this  study.     Yet  Mexicans  are 
flowing  into  tha^country . . . 

From  Canada  i/e  are  getting ..  .the  less  intelligent  of  the 
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wor king-'Class  people  ••••  The  increase  in  the  number  of  French 
Canadians  is  alarming.     Whole  New  i^nqland  villages  and  towns  are 

;    !■■■  t  ii.--        ''^I'M-:   ..,\M-r'^.r.')F    n;t.      ]  u^.  n((-  t  lie    PrencV;  Canadian 

group  in  cur  data  approaches   the  level   ot   the  average  Negro 
intelligence. 

I  have  seen  gatherings  of  the  foreic,n~born  in  which  narrow  and 
sloping  foreheads  were  the  rule ....In  every  face  there  was 
something  wrong  -   lips  thicks  mouth  coarse .. *chin  poorly 
formed .. .sugar-loaf  heads .. .gcose-bil 1  noses     #a  set  of  skew-molds 
discarded  by  the  Creator Immigration  of ficials. . .report  vast 
troubles  in  extracting  the  triith  from  certain  brunette 
nationalities  (Hirsch,  1926,   p.  28). 

Such  positions  clearly  have  degraded  scientific  attempts  to  deal 
with  the  nature-nurture  issue.     Increased  controversy  has  surrounded 
such  notions  as  intelligence  being  fixed  and  predetermined  (Hunt, 
1961)  ,  or  being   influenced  by  environmental  or  social  forces.  The 
"natur e-r.urture  controversy"   was  given  increased  momentum  in  1969  in 
Arthur  Jensen' s  Harvard  .Educa t ional-^Review  article  "How  Much  Can  We 
Boost   IQ  and   Scholastic  Achievement"   in  which  he  discussed  the  relative 
contribution  of  genetic  and  environmental  factors  on   IQ.     Jensen  (1969) 
indicated  that  (a)   compensatory  ed ucat ion  for  disadvantaged  groups  had 
"apparently"  been  a  failure,   ( b)    ther€  was  evic>3nce  to  "make  it  a  not 
unreasonable  hypothesis  that  genetic  facvors  iire  strongly  implicated  in 
the  average  Negro-White  intelligence  difference"    (p.   82)  ,  and   (c)  the 
race  differences  w*?^re  evident  in  conceptual  ability   (Level   II)    ,  but 
not  in  associative  ability  (Level   I).     Despite  continued  attacks, 
Jensen  has  defended  his  position  (cf.   Jensen,   1973a,  1973b).  4 
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Unfortunately,  statements  continued  to  support  racial  perspectives. 
Shockley  (1971)    noted  that  "Nature  has  color  coded  groups  of 

indiv-iuOs  rio   t  h -U:    s   n  1 1  s  t  ^      1  1  y  rpH  U  Ip   pr      i  c  t  i  r^p '>  of  thei? 
adapt.ab' J  1  i  ty   •:o   x .  i  te  i  1  ec  t  ua  i  1.  y  rewardina   and   effect  ivg   lives  can  easily 
be  made  and  profitable  be  used  by  the  pragmatic  man   in   the  street  (p. 
375).     Thus,   although  research  will  continue  to  have  a  bearing  on 
issues  related  to  test  bias,   the  legacy  from  the  past  and  present  will 
likely  influence  any  scientific  analysis  of  the  issues.  Indeed, 
science  occurs  in  a  social  context  and  it  is  that  context  that  must 
continually  be  questioned    (Sewell ,   1981).     Thus,   as  noted  by  Reynolds 
(1982),  a  greater  degree  of  scientific  skepticism  may  be  needed  for 
examination  of  the  issues  surrounding  test  bias  if  the  errors  of  the 
past  are  to  be  avoided. 

Personal  i  ty  -  Assessment  -r^ov^eot 
Development  of^Traditional^Tests 

While  tests  of  cognitive  ability  wera  rapidly  evolving  during  the 
early  part  of  the  century,   tests  of  "personality"  were  in  their 
infancy.     Although  such  devices  as  the  Woodworth  Personal  Data  Sheet 
were  used   in  the  military  during  WWI,   the  personality  assessment 
movement  received,  increased  attention  through  the  development  of 
projective  techniques  such  as  the  Rorschach  and  Thematic  Apperception 
Test  (TAT). 

World  War  II    (W\^II),  like  the  first  war,  did  much  to  set  the  stage 
for  rapid  proliferation  of  testing  practices.     Indeed,  psycholoq ical 
testing  combined  with  the  military  need  for  assessment  was  one  of  the 
primary  factors  leading  to  the  development  of  clinical  psychology  as  an 


Assessment  Bias 
45 

independent  specialty  (cf.  Maloney&  Ward^  1976). 

During  the  period  following  WW  II ^   testing  practices  developed 
drarnnt- 1  r  H  ^  y  .      ^-^n  -  V    ^r-sis  de'^  o  ;  o  p^.rl   di.^-inM    this    ueri^.d    W(^;  r  e    tied    to  an 
intrapsychic  aisease  model   or   state-trait  concep .:ua  1  i  za 1 1 on  of  behavior 
(cf.   Mischel,   1968).     Psychoanalytic  theory  generally  accelerated 
assessment  procedures  that  would  r ?veal  unconscious  processes. 
Assessment  practices  emphasized  an  "  indirect- sig,n"  paradigm. 
Assessment  was   indirect   in  that  measurement  of  certain   facets  of 
behavior  were  disguised  or  hidden   ^:rom  the  client   (e.g. ^  TAT). 
Moreover^  within  the  context  of  the?  int;rapsychic  model  ^  testing 
practices  were  said  to  predict  certain  states  or  traits.  The 
clinician's  task  was  to  administer  a  battery  of  tests  to  a  client  and 
look  for  certain  signs  of  traits  or  states.     An  example  of  this 
approach  was  represented   in  the  work  of  Rappaport^   Gillf  and  Shafer 
(1945).     In  their  classic  book  the  authors  demonstrated  how  a  battery 
of  tests  (e.g.^  TAT^   Rorschach^  WMS)   could  be  used  to  diagnose  deviant 
behavior  within  the  intrapsychic  model    (in  this  case  the  psychoanalytic 
model)  . 

Similar  to  the  sign  approach  was  the  "cookbook"  method  of 
assessment  that  reached  a  zenith  during  the  mid-1950' s  (cf.   Meehl ^ 
1956).     An  example  of  this  approach  was  the  Minnesota  Multiphasic 
Personality   Inventory  (MMPI) (Hathaway  &  McKinley^   1943).     As  these 
authors  note^  one  of  the  presumed  advantages  of  the  cookbook  approach 
was  that  "it  would  stress  representativeness.-  of  behav  ioral  sampl  ing  ^ 
accuracy  in  recording   and  cataloguing  data  from  research  studies^  and 
optional   weighting  of  relevant  variables  and   it  would  permit 
professional  time  and   calent  to  be  used  economically"    (p.  243). 
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Eroergence^of  •  Bebavior-Modi  fi cat! on  -  and -.Assessment 

Behavior  mod i f ica t ion  or  behavior  therapy  and  assessment 
a  f  ^  i  1  i  a  tp'""^   v;ith   *' h  i  s   morlel    have  'riade   t   rrrienr^our^    imn^u-'t  on   v  \'^'u>iony 
and   v'ducation    i:;    recent   year:  .   bir:,.    nan  bcc>    aiuu^i^t   cn-Tipl '-■te  1  y 
overlooked   ir  historical  accounts  of  bias   in  assessment   (see,  however, 
Kratochwill   et  al .  1980). 

As  recent  historical   reviews   illustrate   (Hersen,  1976;  Kazdin, 
1978)    behavior  therapy  represents  a  departure  from  traditional  models 
of  assessment  and  treatment  of  abnormal  behavior,  both  psychological 
and  educational.     Although  the  history  of  behavior  therapy  cannot  be 
traced  along  a  single  line,  contemporary  practice  is  characterized  by 
diversity  of  viewpoints,  a  broad  range  of  heterogeneous  procedures  with 
vastly  different  rationales,  open  debates  over  conceptual  bases, 
methodological  requirements,  and  evidence  of  efficacy   (Kazdin  & 
Wilson,  197R).     Some  reports  of  behavioral   treatment  followed  Watson 
and   Rayner's  (1920)    work  in  conditioning  of  fear   in  a  child,  but  a  ^ 
significant  impetus  to  behavioral   treatment  is  commonly  traced  to  the ^ 
^    publication  of  Wolpe's  (  1958  )  Psycbotberapy-^by-ReciprocaKlnbibit  ioa . 

Independent  of  Watson  and  Wolpe's  work  was  research  in  the 
psychology  of  learning,  both  in  Russia  and  the  United  States. 
Particularly  important  in  learning  research  was  operant  condito'^^ng 
which   Skinner  brought  into  focus  in  the  late  1930s.     The  evolution  of 
operant  work  into  exper iir, ental  and  applied  behavior  analysis  has  had  an 
important  influence  in  the  development  of  behavior  therapy  and 
assessment  practices  in  general. 

Although  behavior  therapy  and  assessment  has  evolved  considerably 
over  the  past  few  years  some  general  characteristics  represents  unities 
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within  the  he  tero^^^ei  ty  of  contemporary  practice: 

!•     Focus  upcin  current  rather  than  historical  determinants  of 

bohav  i  or  v 

■  -       :ai  i)\  ■  r'. 1.      ',)  ■■•  I )« -  l  i ,:-5  \.'  j  o  I      ' '  a  r: >  m  ■   , )  ^,    t  h  ■ ;  ■  c-i  ,i  i  >   ■,  j.  i  t  e  r  i  o  r":  b  v 

which  treatment  should  be  evaluated; 
3*     Specification  of  treatment  in  objective  terms  so  as  to  make 

re pi i cat  ion  possible  ; 
4»     Reliance  upon  basic  research  in  psychology  as  a  source  cJf 

hypotheses  about  treatment  and  specific  therapy  techniques; 

and 

5*     Specificity  in  defining,   treating,  and  measureing  the  target 
problem  in  therapy   (Kazdin,   1978,  p.   375)  • 
A  detailed  account  of  the  history  of  behavior  modification  can  be 
found  in  Kazdin  (1978). 

With  the  advent  of  behavior  modification  and  its  proliferation,  a 
new  assessment  role  also  developed,  particularly  for  clinical 
psycholog is ts •     Behavioral  assessment  emphasized  repeated  .roeasuremeot 
of  some  target  problem  prior   (baseline),  during,  and  after  (follow-up) 
the  intervention*     Hersen  et  al,   (1976)   note  that  the  psychologist's 
expertise  in  theory  and  application  of  behavioral  therapy  techniques 
(e»g*,  classical  and  operant  conditioning)    also  enabled  both  an 
assessment  and  treatment  role  to  emerge  in  psychiatric  settings •  Thus* 
the  psychologist  in  various  settings  (e*q*,  clinics,  hospitals, 
schools)   became  involved   in  direct  service,   rather  than  engaged  in 
testing  and  diagnosis.     Behavior  modification  provided  the  impetus  for 
these  new  roles* 

Developments  in  behavioral  assessment  have  also  influenced  the 
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field  of  personality  testing   in  general.     In  many  respects  assessment 
has  acted  as  a  barometer   for  the  ^ur  rent   "ihinking  of  personality 

♦^heorists,      Pnr   pxample,    i   ha  *    •i^^'^  *  er   of   rrhinM^    in  viows  aboit 

specifically  devoted   tc      --sessment   in  professional  psycholoqy 
(Goldfriedr  1976).     "".le  journal,   initially  founded   in  1936,  was 
entitled  Rorschach  .Research  -Exchange ,     Gradually  other  projective 
techniques  came  into  existence  in  the  assessirent  process  and  by  1947 
the  title  was  changed  to  the  Rorschach^Research^Excbange-.and  -  Journal  .of 
Projective  Techniques,     Because  the  Rorschach  became  less  dominant  in 
assessment,   the  name  was  again  changed   in  1950  to  the  Journal ^of 
Projective  Techniques,     Gradually,   the  more  objective  per sonality 
assessment  techniques  (e.g.,   the  MMPI)    were  being  used  and   in  1963  the 
title  ^a^       .-.-.:ed  to  the  Journal  .of  .P-rojecti ve  Techniques  .and 
Personal i ty  Assessment ,     Projective  techniques  continued  to  show 
disappointing  research  results  and   in  1971  this  may  have  prompted  the 
journal's  change  to  its  percent  title,  J ournal^of ^Personal! ty 
Assessment ♦     While  it  is  unclear  as  to  what  the  next  change  in  title 
will  be,   it  is  projected  to  be  something  like  the  "Jour nal ^ of ^Behavior 
and  -Personal  ity  .Assessment"  . 

Nevertheless,  there  has  remained  some  doubt  as  to  whether  the 
future  direction  of  assessment  will  take  a  distinct  behavioral 
orientation.     Even  in  1963  when  the  journal,  Behavior^Rcsearch  .and 
Therapy  made  its  appearance  the  issue  was  raised  as  to  whether  there 
would  be  a  lairge  enough  readership  to  justify  its  existence  (Brady, 
1976).     However,  as   Hersen  and   Bellack  (1977)    have  documented,  the 
future  looks  very  positive  as  reflected   in  major  journals  inaugurated 
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in  the  United  States  between  t>^e  years  1968  and  1970  (Journal  ..of 
Applied  Dehav ior  .Analys is »  Bebavior  Therapy ^   Journal  of -Behavior 

Therapy  and  Experimental  Psychiatry),.      Moreovor  .    sovoral  recent 

and  -Resea  r  ch  ^    - 1  o  £  eedback  .  ano  . Se  1  t"-. con  t  r  oi   and   t  here  are   now  some 
specific   journals  devoted   primarily  to  behavioral   asse   sment  (e.q., 
Behavioral-^Assessment  ,  Journals  of -Behavioral  ^Assessment)  , 

Evolution  .of  -Nond iscr iminatory  .and-^on^biased  Assessment 
Testing  as  .  the  .Con text 

With  the  rapid  oroli feration  of  tests  during  the  latter  part  of 
this  century  a  number  of  criticisms  of  tests  and   testing  practices 
emerged.     Much  of  the  controversy  has  been  over  tests  of  so-called 
"mental  ability"  or  "intelligence"   (e.g..   Black,   1963;   Garcia,  1972; 
Gross,  1962;   Holman  &  Docter ,  1972;   Holtzman,  1971;    Laosa,  1973b, 
1977a,   1977b;   Laosa  &  Oakland,   1974;   Martinez,   1972;  Mercer,  1972, 
1973;  Williams,  1971) ,  particularly  with  minority  group  children  (e.g., 
Kratochwill  et.   al .  ,    1980;    Reschly,    1979).      Indeed,   the  major 
controversy  in  discriminatory  or  biased  testing  has  been  that  because 
minority  group  individuals  typically  score  lower   (or  respond 
differently  to  questions)    on  various  conventional   tests,  discriminatory 
or  biased  practices  will  result  when  vocational  and/or  educational 
experiences  are  denied  to  these  individuals  (cf.  Laosa,  1977).  The 
argument  is  advanced  that  many  standardized  tests  are  biased  toward 
people  of  backgrounds  other   than  that  of  middle-class,  white,  and 
English  speaking.     While  the  arguments  against  traditional   testing  are 
not  limited  to  educational  settings,   it  is  in  educational  settings 
where  tests,  especially  ability  measures,   have  been  used  to  classify 
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individuals  for  various  special  education  classes. 

B  i  ri  sod  ^s2ess^Tien  t  _P££<^^  i  ^^-^^^^  n  5^chool  s 

been  <.it:}pl  x in   :i   variety   of   settinqs.      With   t}io  development  of  group 
tests   it  became  oossible  to   test  laroe  numbers  of  school  children 
(Pintner,    1931).      In  elementary  school: 

The  chief  practical   uses  of  tests  up  to  the  present  time  have 
centered  around  their  value  for  the  purpose  of  classifying 
children   into  more  or  less  homogeneous   intelligence  groups  ,  and 
also  for  predicting  their  future  success   in  school   work.  These 
two  ourposs  are  intimately  bound  up  with  each  other. 
Classification   in  homogeneous  grouDs   is  justifiable  because  „ 
intelligence  correlates  highly  with  school  success,  and  therefore, 
the  more  homogeneous  the  group  the  more  likely  are  the  children  in 
the  group  to  advance  toaether  at  about  the  same  rate^.^be  that  rate 
relatively  fast,   normal,   or   slow   (Pintner ,   1931  ,  p.    23^)  . 
While  homogeneous  grouoing  was  widely  practiced  by  193C^  (McClurf^, 
1930),  critical   reactions  to  this  practice  (e.g.,   Keliher,   1931)  as 

3ll  as  negative  reviews   (e.g.   Rankin,  1931)   did  little  to  influence 
the  practice  that  continued  well   into  the  future  (cf*  Carroll,   1978)  • 
Indeed,  Carroll    (1978)   notice  that  research,  on  the  efficacy  and 
usefulness  of  ability  grouping  had  up  to  1935  yielded  no  clear 
conclusions  and  a  continued  negative  tone  has  pervaded  the  more 
contemporary  period   (cf.   Svenson,   1962;   Findley  &   Berg an ,   1971).  Of 
course,  part  of  the  oroblem  has  been  the   inability  of  research  to 
elucidate  solutions  to  the  problem. 
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Schools  have  continued   to   be   the   focal   noint   for  analysis  of  the 
use  of   tests.      Unfortunately,   testinq  practices   in   schools   adhered   to  a 

very  restrictive  nn    lei,   particularly   in   abiltiy  testinq: 

Throuqhout   the  school   system ,    from  elementary  school    to  the 
university,   the  use  of   intelliqence  testf'     ee        to  have  been 
pred  icated  on   the  assuinot  ion   that   the  i  r   scores   reflected  rna  inl  y 
innate  or  at   least   relatively  unalterable  characteristics  of 
students   having   to  do  with  their   caoacity  to  do  school  work. 
Althoiiq^   it  was  noted   that  averaqe  scores  were  correlated  with 
demographic  variables  such  as  socioeconomic  class,  race, 
urban/rural   env iornment ,   etc.,    there  does  not   seem   to   have  been 
any  serious  cc    sideration  of  wheth;.r  children's  home  background, 
or  even  their   schoolina,  would   have  any  important   influence  on 
their  performances   in  mental    tests    ...   the  question  of  whether 
test   scores  were  biased  by  cultural   factors,   for  example,  was 
hardly  ever   raised  during   the  developmental  period  of   the  mental 
testing  movement   (Carroll,   1978,   p.  36). 

Over   time,   issues  of  bias  or  dis  rimination  were  increasingly 
raised.     Berdie   (1965)    noted   that  various   tests  may  lead  to 
discriminatory  practices.     Mercer   (1971,   1973,   1975)    supported  this 
observation   after   studying   the  relations  between  membershio  in  ethnic 
minority  groups  and  placement   in  classes   for   the  mentally  retarded  in 
public  schools   in  California-     Mercer    (1975)  noted: 

We  classified  every  person  on  the  case  register   into  ten  groups 
according   to   the  median  value  of  the  housing  on  the  block  on  which 
he  lived.     We  found  that  peiisons   in  the  lowest  socioeconomic 
categories  were  greatly  over- rePr.esen ted  on  the  register  and  those 
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from  hiqher  statuses       re  under  represented .     When  we  studied 
ethnic  qroupn^  v;e   found   3n0  Percent  more  Mexican-Americans  and  5(? 
percent  riiore  blacks  than  their  counterparts   in   liie  general 
population  but  only  60  percent  as  many  Anglo-Americans  (Caucasians 
whose  primary  language   is   English)    as  would  be  expected.  Because 
most  i'U»x  ican-Anier  icaris  and   blacks   m   Riverside  come   from  lower 
socio-economic  backgrounds,  ethnic  group  and  socioeconomic  status 
are  correlated.     When  we  held   socioeconomic  status  constant, 
Anglos  were  still   under represented  and   Mex ican-Amer i cans  were 
still   over reoresented   in   the  case  register  but  blacks  aooeared  in 
their   proper  proportion   (p.    133)  . 

Mercer  and  her  associates  al^~     lu<     i   t-hat  this  over  representation 
of  Mexican-American  and  black  child-^n)   ir    -losses   for  the  educable 
mentally  retarded  was  a  statewin     ^  ^^-^l     pnd   not   just  a   local  finding. 
The    implication  of  this  was   tf.at  cr       ^        from  certain  low 
socioeconomic  groups  or   from  et      \c  mir^    ity  croups  are  more  vulnerable 
to  being  classified  as  mentally  retarded  and   that  certain  assessment 
devices   (mainly  intelligence  tests)    are  culturally  biased. 

Of  course,   these  problems  are  not  related  to  only  tests  of 
intelligence  atvl   the  mental   retardation  classification.     Although  TO 
tests  may  not  be  tne  primary  reason  for  over  and  under representat ion  of 
minorities   in  soecial  classes   (Meyers,  Sundstrom,   &  Yoshida,  1974), 
legal   issues  have  primarily  focused  on  test  bias  as  th.*  reason  for 
disproportionate  representation  of  minorities   in  special  classes 
(Reschly,    1979).     Thus,   it  appears  that  abuses  of   intelligence  testing 
have  received  putjdTic  scrutiny  due   to  social  and  political  consequences, 
but  many  of  the  problems  with  the   intelligence  testing  have  been  true 
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of  other   kinds  of  norm- refer  encerl   assessment    (cf.    Salvia       Ys  s^^  lei  yk*^  , 
1978).     Criticisms  of  s  tanda  rd  i  zocl  assessment  have  been   focused  on  many 
dimensions    (r.^osa,    1^71b,    1077  ;    Ne   1  and  ,   1^71;   Oaklcind,    197?,  1977; 
Salvia   &   Ysseldyke,    I97R;    Thorndike  &    Hagan,    1969).    Laosa  (1977) 
SMrnmarized   these  criticisms: 

1.  Stanflar'l  i  ze'l   tests   are  biased   nnd   unfair    to   persons   from  cultural 
and   socioeconomic  minorities  since  most   tests  reflect  largely 
white,  middle-class  values  and   attitudes,   and   they  do  not  reflect 
the  experiences  and  the  linguistic,   cognitive,   and  other  cultural 
styles  and  values  of  minority  group  persons. 

2.  Standardized  measurement  procedures  have   fostered  undemocratic 
attitudes  by  their   use  to   form  homogeneous  classroom  grouos  which 
severely   limit  educational,   vocational,   economic,   and  other 
societal  opportunities. 

3.  Sometimes  assessments  are  conducted   incompetently  by  persons  who 
do  not  understand   the  culture  and   language  of  minority  group 
children  and  who  thus  are  unable   to  elicit  a   level   of  perforrrmce 
which  accurately  reflects   the  child's  underlying  competence. 

4.  Testing  practices   foster  expectations   that  may  be  damaging  by 
contributing   to  the  self-fulling  prophecy  with  lov/  level 
achievement   for  persons  who  scoAre  low  on  tests. 

5.  Standardized  measurements  rigiqly  shape  school  curricula  and 
restrict  educational   change.  ^ 

6.  Norm- referenced  measures  are  not  useful   for  instructional 
Purposes , 

7.  The   limited   scope  of  many  standardized  tests  aooraises  only  a  part 
of  the  changes   in  children  that  schools  should  be  interested  in 
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produc  inq  . 

8.  Standardized  testinq  practices  foster  a  view  of  human  beinqs  as 
having  only  innate  and  fixed  abilities  and  characteristics,  (p. 
10-11)  . 

These,   among  other   issues,  are  the  primary  focus  of  the  remainder  of 

the   r i.^po  r  t  „ 

Sumroary -and  Conclusions 

In   this  chapter  we  have  provided  an  historiccTl   perspective  on  the 
development  of  assessment   practices    in   psychology  and  education*  We 
noted   that  assessment  practices  actually  have   their   roots   in  antiquity. 
Many  assessment  oractices  used   today  in  psychological  and  educational 
settings  can  actually  be   traced  back   to  activities  that  occurred 
hundreds  of  years  ago.     Thereafter,  developments   in  France,  Germany, 
England,   and   the   United   States   formed   the  basis   for  developments  that 
would  occur   in  more  formal   and  standardized  testinq. 

A  major  movement  called  "differential  psychology"   formed  the  basis 
for   the  raoid  proliferation  of  ability  testing   in   the  United  States. 
Many  tests  of   intelligence  were  developed   to  assess  children's  ability 

to  succeed   in  school.     Many  of   the  tests   that  were  devloped  were 
actually  used   to  place  children   into  special  classes  or   for  homogeneous 
grouping  procedures.     The  early  Binet  scale  and   its  revisions  as  well 
as  group  tests  of   intelligence  were  used  for  this  purpose. 

Some  of  the  individuals  who  were  active  in  development  of  early 
tests  held  views  that  can  be  labeled  as  "racial".     Questions  were  often 
raised  as  to  the  motivations  for  test  development  and  their  subsequent 
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use  as  a  result  of   those  view^s  held.      It   is  clear   from  an  historical 
perspective  that  son.e  of   these  positions  could   not  and  would   not  hold 
up  to  empirical   analysis.     At  the  heart  of  many  of  these  early 
positions  was  the  notion   that  observed  differences  in^measured 
intelliqence  among  different  racial  or  ethnic  groups  was  due  to  genetic 
(i  i  f  foron^-or,  .     This   issue  has  remainod  a  central   source  oi  conttovery  in 
present  day  research  and   writing  on  test  bias* 

Major  developments  also  occured   in   testing  oersonality  and 
behavior.      Followinq  WWII,  many  traditional   tests  of  personality  (e.g,^ 
Rorschach)    were  used   to  assess  children  and  adults.     As  a  movement 
behavior  modification  was   part   reactionary  to  traditional  methods  of 
testinci.     Develooments   in   this  area  of  psychology  and  education  have 
had  a   tremendous   impact  on  both  assessment  and  the  nature  of  special 
education  services  provided   to  school  children. 

Finally^    in  the  chapter  we  traced  some  of  the  more  recent 
developments   in   the  area  of  assessment  bias   in  educational  settings. 
Again/    it  was  emphasized   that   standardized  tests  of  ability  have  been 
the  primary  focus  of  criticism   in   research  and  writing.  Unfortunately^ 
many  of   the   issues  raised  by  test  supporters  and  critics  alike  have  not 
been  subjected   to  empirical  research. 
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ChdpLer  3 

Conceptual  Models  of  Human  Functioning: 
Implications  for  Assessment  Bias 


An  extraordinary  amount  of  theory  and  research  has  been  generated 
bhat  has  a  bearing  on  bias  in  psycholgical  and  educational  assessment. 
\3  a  result r  a  tremendous  amount  of  data  have  accumulated  concerning 
bhe  origins^  developraent^   influences^  and  variations  in  human  behaviot. 
Nevertheless r  the  wealth  of  information  has  clearly  not  resulted  in  any 
integrated  view  of  human  performance.     Indeed,  the  current  state  of 
knowledge  generated  from  the  various  conceptual  models  has  not  only 
resulted  in  the  lack  of  an  integrated  view  of  human  functioning ^  but 
las  yielded  various  conceptual  positions  that  are  diametrically 
opposed • 

Because  our  understanding  of  ^  im^n  behavior  is  influenced  by  basic 
assumptions  concerning  the  "why"  of  behavior  r  asscfssment  practices 
Dften  become  inextricably  interwoven  with  the  particular  conceptual 
nodel  of  human  functioning  held  by  the  asceessor.     ;3if£ere>  -  mod^l^, 
fiith  their  different  perspectives  of  behavior,  yield  vast-y  different 
assessment  approaches  ^nd  data  which  are  used  ifi  making  decisions 
relative  to  classification  r:id  intervention.     Differc^nt  conceptual 
nodels  must  be  oonsider^rd  in  designing  nondiscr irainatoty  or  non-biased 
assessment  programs   '^ercer  &  Ysseldyke^  1977).     Pz^^^am  hly^  different 
nodels  will  yielc^  different  diagnostic  decisious     id  interventions, 
rhe  conceptual  and  psychometric  validity  and  c::edibil  i  ty  of  each 
3articular  model  must  be  evaluated  and  bias  examined  in  light  of 
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concf^ptual   ami  meMiodoloq  ica  I   quality  within  various  models. 

Ir  this  chaptci:  we  review  seven  models  of  human  behavior  that 
inf '  ijf nee  con,  e.nporary  assessment  practices.     The  models  reviewed 
include  the  medical  or  biogenetic  models    intrapsychic  disease  models 
psychoeducational  process  of  test-based  model r  behavioral  model , 
socioloqical  deviance  model r  ecological  model  and  pluralistic  model. 
These  various  models  have  been  discussed  by  others  in  the  professional 
litT^rature,     For  our  purposes,   these  models  will  also  be  examined  in 
light  of  the   implications  they  hold  for  potential  bias  in  assessment. 
The  models  differ   in  their  conceptualization  of  deviant  behavior, 

ssessment  procedures  and  devices   (sometimes),  as  well  as  the  nature  of 
the   intervention  employed.     Because  the  behavior  therapy  model  has  not 
received  as  much  attention  in  the  nonbiased     assessment  literature,  and 
because  many  behavioral  procedures  such  as  task  analysis,  are  being 
advocated   in  non-biased     assessment,  we  discuss  this  model  in 
relatively  greater  detail.     Each  model   is  discussed  within  the  context 
of  various  components  and  considerations  in   its  use. 


Medical ^Model 

Components 

The  medical  model   is  one  of  the  oldest  approaches  guiding 
assessment  and  treatment.     The  medical  model  can  be  applied   in  either  a 
literal  or  metaphorical  context  (Phillips,   Draguns,  &  Bartlett,  1975). 
In  this  section  we  view  the  medical  model   in  its  literal  sense.  That 
is,  abnormal  biological  systems  can  be  traced  to  some  underlying 
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biological   patholoqy  which  is   then  treated*     For  example,  defective 
hearinq    (symptom)   may  be  traced  to  some  type  of  infection   (the  cause) 
which  may  be  treated  with  antibiotics.     The  prevalance  of  medical 
problems   in  the  schools  is  actually  quite  hiqh   (Schroeder^  Teplin,  & 
Schroeder,    1982).     For  example.   May  Lau,   Lowenstein,   Sinnette,  Rogers, 
and  Novick   (1976)   screened  190  second-grade  students  from  two  schools 
in   Hatlein.      Tliey   found   that   109    ([)?%)    had  a   total  of   170  health 
problems.     A  variety  of  health  problems  may  be  found  in  the  school, 
including  those  who  are  chronically  ill,   those  with  nutritional 
disorders   (undernutrition,  obesity),  hearing  and  visual  disorders, 
dental  problems,  disorders  of  bones  and  joints,    infectious  disorders, 
respiratory  disorders,  allergic  disorders,  urinary  disorders,  blood 
disorders,  neurological   problems,  cardiovascular  disorders,  as  well  as 
drug   related  problems   (Schroeder  et  al.,   1982).     It  seems  clear   that  a 
medical  model   is  clearly  appropriate  to  deal  with  the  diversity  of 
medical  problems  in  the  schools. 

The  medical  model   is  a  disease-based  model.     The  pathology  is 
assumed  to  be  within  the   individual.     Some  theorists  consider 
biological  deviations  to  be  the  necessary  and  sufficient  factors  in  the 
development  of  the  pathology,  while  others  claim  that  chemical  or 
neurological  anomalies  are  the  necessary  but  not  sufficient  condition 
for  pathogenesis.     Here,  environmental  conditions  may  or  may  not 
catalyze  a  constitutional   predisposition  to  pathology. 

Considerations 

Medical  model  assessment  procedures  are  clearly  justifiable  when 
.  iiere  is  no  basis  for  assuming  physiological  change  in  the  organism  as 
K..  result  of  the  socio-cul tutal  environment.     Appropriate  use  of  the 
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medical  moilol   "should  not  yield  radicrally  or  culturally  d  i  scr  im  m;  i  tory 
results  except  to  the  extent   tha^;  poverty  and  socioeconomic  deprivation 
are  associated  with  particular  qrovtps  and  elevate  the  prevalence  of 
poverty-related  organic  pathologies   in  these  groups"    (Mercer  & 
Ysseldyke,    1977,  p.   72).     Discriminatory  practices  may  very  well 
characterize  medical  model  assessment  when  they  are  used  to  interpret 
measures  of    learned   h^^havior    (e.q.,   various    forms  of  disruptive 
behavior   in  children,  academic  skill  deficits,  etc.).     Seventy  or  iore 
years  of  biological,  bioniedical,  and  genetic  research  have  isolated 
very  few  clear  physical  bases  for  recognized,  psychopathology  (cf. 
Phil''    s,   et  al . ,   1975).     While  genetic,  developmental,  neurological 
and  biochemical   factors  all   undoubtedly  influence  behavior,   in  reality 
these  factors  are  not  discrete  entities.     They  are  interwoven  with  one 
another  as  well  as  with  environmental   factors.     This  may  have  led 
Ausubel    (1969),   in  defending  the  concept  of  disease  to  describe 
abnormal  behavior,  to  contend  that  it  is  valid   to  consider  a  particular 
symptom  as  both  a  manifestation  of  disease  and  a   faulty  interaction 
with  the  environment. 

Applications  of  the  medic:il  model  may  bias  assessment  in  various 
ways.     Organic  factors  may  not  always  be  the  cause  of  an  observed 
medical/physical  problem.     There  is  growing  recognition  that 
psychological   factors  may  affect  a  physical  condition  and  that  physical 
symptoms  may  have  no  known  organic  or  physiological  basis  (e.g., 
DSM-III) .     In  the  past,  various  concepts  such  as  "psychosomatic"  cr 
"psychophysiological"  have  been  used  to  describe  the  psychological 
basis  for  physical  or  somatic  disorders.     However,   such  perspectives 
may  also  be  of  limited  usefulness  because  it  implies  a  simplistic 


ERIC 


Assessment  nins 

60 

relation  betwoon  psycholoq  ica  1   factors  and  a  distinct  qroup  of  phyriical 
disorders  when   in  fact^   there  may  be  a  complex   interaction  of 
bioloqical,  env ironmental r  psycholoq ical r  and  social  factors 
contributinq   to  various  physical  disorders   (Sieqel^   198^).  Lipowski 
(1977)    noted  : 

The  concept  of  psychogenis  of  organic  disease... is  no  longer 
tenable  md  has  q iven  way  to   the  multiplicity  of   all  d i seaso . . . the 
relative  contribution  of  these  factors   [social  and  psychological] 
varies  from  disease  to  disease^   from  person  to  person^  and  from 
one  episode  of  the  same  d i sease  i n  the  same  person  to  another 
episode...  If  the  foregoing  arguments  are  accepted  then  it  becomes 
clear  that  to  distinguish  a  class  of  disorders  as  "psychosomatic 
disorders"  and  to  propound  generalizations  about  psychosomatic 
patients  is  misleading  and  redundant.     Concepts  of  single  causes 
and  cirilinear  causal  sequences  for  example  from  psyche  to  soma 
and  vise  versa  are  simplistic  and  obsolete   (p.  234). 
The  point  here  is  that  even  in  the  treatment  of  physical  disease^ 
psychological    factors  may  be  invo.  ved    (Melamed  &  Siegel,  1980). 
Exclusive  reliance  on  medical  assessments  may  bias  treatment   in  the 
sense  that  psychologic::l    (or  other  )   aspects  of  functioning  may  be 
involved. 

The  medical  model   is  being  used  with  increasing  frequency  in 
psychology  and  education.     For  oxampie^  visual  and  hearing  screening 
are  mandated   in  PL  9<I-J42.     A  largo  ntjmber  of  different  screening  tests 
are  available  fci   assessinq  phyrica)    factors   (e«g.^   Meier^  1975; 
Conner ,  Hoover  ,  !^or  ton^   Snnds ,  .^l-oir-feld  &  Wol  insky  ^  1975;   Schroeder  et 
al.r   1982).     ThuSr  measures  sf^nsitive  to  organic  conditions  will  be 
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to  (uiv  i  ronm(»n  t  o  I  ^   ps  yc^ho  1  oq  i  cm  I  ,   cinrJ   social    f<i(  tor5;. 

As    iudicatod  above?,   probler.5;  most  ofton  aris(*  when  bohavioral 
measuro*;   th.it   can  ho   i  n  f  1  uc^ru  tmI   })y  a  varic^ty  o\    oiiv  i  roninf*n  t  a  1 
circums  ..ancc?s  are  employed   to  assess  the  potential   orqanic  oriqins  of  a 
perceived  symptom.     The  more  the  individual  differences  observed  or*  a 
b(?havioiMl    jn«Nisnrt»  ar<:»    in?    uf^n'tMl   by  env  i  r  orMinui  t  ri  1    fac^tors',    t  ho  moro 
the  measure  has  the  pot€?ntial   of  beinq  biased.      Such  a  circumstance  may 
arise  when  the  environmental   factors  that   influence  the  measure  differ 
across  groups.     An  example  of  one  such  measure   is  the  Bender  Visual 
Motor  Gestalt  Test  when   it   is  employed  within  the  medical  model  to 
identify  potential  orqanic  patholoqy.     Althouqh  Mercer   (1979)  employs 
the  Bender   in  the  SOMPA  as  a  measure  appropriate   for  interpretation 
from  within  the  medical  models  she  also  reports  significant  correlation 
bet%#een   the  Bender  and  various  sociocul  tural  measures  and  bet%#een  the 
Bender  and  ethnic  groups.  With  respect  to  the  latter^  when  using  the 
Koppitz   (1963)   scoring  system^  black  children  at  each  age  level  bet%^en 
5  and  11 ^  make  approximately  two  errors  more  than  white  children. 
Hispanic  children  at  each  of  the  same  age  level  make  approximately  one 
error  more  than  white  children.      In  discussing   the   influence  of  ?:ocial 
and  cultural   factors  on   the  Bender ^   Koppitz   (1975)   concluded  that 
children   from  different  ethnic  groups  may  develop  visual-motor 
perception  skills  such  as  those  measured   in  the  test  at  different 
rates,  and  that  these     d  i  f  f  erenc^^s  ^   in  part^  may  be  a  t  V.r  ibutable  to 
factors  such  as  cultural  variations  in  ch  i  Id- rear  ing  practices,  rinrl  the 
vclue  that  varying  culture  places  on   these  type  skills. 
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Psy c h o d  y  n a\r\  ic   Mori r>  1 

Component s 

Tho  ••psyr  Jiodynam  i  r  mo(J<»l"    impl  icM4   that  maladaptivo  bohaviors  at^- 
symptoms  resulting   from  unrierlyinq  processes  analoqous  to  disease  in 
the  literal  sen^e.     This  model   is  sometimes  labeled  the  medical  model 
i?!   ps  ychol  oq  i      1    nnr5   psychood  uca  t  i  on.^  1    practice.  Because 
conceptual izdtion  and  treatment  of  abnormal  >ehavior   initially  resided 
largely  within  the  domain  oi  medicine,   the  medical  model  was  extended 
to  treatment  of  abnormal  behavior,  both  medical  and  psychological. 
While  the  historical  developments  of  the  model  are  not  reviewed  in 
detail  here,   the  reader   is  referred   to  several   sources  which  discuss 
this  approach     (e.g.,   Alexander  «.   Selesnick,   1968,   Kraepelin,  1962). 

The  psychodynam ic  approach  is  characterized  by  the  following:  "(a) 
uses  a  number  of  procedures,    (b)    intended  to  tap  various  areas  of 
psychological   functioning,    (c)   both  at  a  conscious  and  unconscious 
level,   (d)    using  projective  techniques  as  well  as  more  objective  and 
standardized  tests,   (e)    in  both  cases,   interpretation  may         ^  on 
symbolic  signs  as  well  as  scorable  responses,    ( f )   with  the  goal  of 
describing   individuals  in  personalog ical  rather   than  normative  terms" 
(Korchin  &  Schuldberg,   1981,  p.  1147).     As  is  evident  in  the  above 
characterization,  the  psychodynam ic  approach  is  aimed  at  providing  a 
multifacited  description  of  the  client.     The  psychodynamic  approach  has 
also  been  characterized  as  involving  a  great  deal  of  subjective 
description  and  inference.     This  process  is  said  to  promote  a  unique 
and  individual  apf   oach  to  child  assessment. 

The  psychoanalytic  model  represents  one  example  of  the 
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p%y(  •  ho< !  yiMui  I  (  (Iim'I'.i-  p,ir.i!|f|in  ,is  i\u  m,\i\y  oth^if  (iyntinir'  ino<fi'l\  ni  hnin.in 
f  unr  t  ion  I  (iM  .      Th(»  (lyn.tmic   .tpf)r  ().t<  h    to  smon  t    of    dovi.int    hotiavioi  is 

b€*.st    1*1  uc  iH.it  <mI   withir^   thf»  (M)ntf*xt    of      N5;iimpt  i  ons   hrld  <ibout    t  ho 
iol<»rtKil   (Jyiwimics  of    fx-r  son.i  1  i  t  y    (Misctu'l,    PH^H).      Tr      i  t  i  ona  I  1  y  , 
(Jyn.iniK^  .if)pr  ociolu^s  liavi*    inrf*rr<Ml   sorru^  und^Tlyinq   (^onst  ru('t  r.  th.it 
account    for  consistimcy   in  behcwior.     A<;5;ossnien t   is  viewed   a*^;  a  moans 
of    idon^)f  yinr|   somo   siqn  of    t  ho*^.o   hyr>^*  hotiral    ron*;t  rurts  which  aro  of 
i^ontral    importaruo    in   pr  od  i  ( •  t  i  n<|   h(diavior.      T\\\r>  ±[^}JJi^}:^_^\^i^^^ 
in  assessment    (cf.   Mischel,    197?,   p.    119)    includes  a   larqe  variety  of 
projective   tests   (e.q.,    Rorschach,   TAT,    Fiqure   Drawinqs,  Sentenc 
Coinpletion  Tests  )    as  well    as   "objectivo"   personality  inventories 
(e.(i.,    MMIM,    CaliforrWvi    l>sycho  1  oq  i  r  a  1  Inventory). 

A  sefM^nci    to,Uur«*  of    t  h(?   t  r  tui  1 1  i  oria  1    psyehoclyn<ini  i  e   aooroctch    is  that 
it  assumes   that  behavior   will    remain  quite  stable  reqardless  of  the 
specific  environmental   or  situational   context.      In   this   reqard  test 
cor.tent   is  of   little  concern  and  may  even  be  disguised  by  making  items 
ambiguous^  as   is  true   in  proiertive  testing    (Goldfried  &  Sprafkin, 
1974).      Indeed,   a  particular   t  nse  from  a  projective   test   is  rarely 

examined   in  view  of   the  overt        .     ties  of  the  situation   in  which  it 
occurred,   but   is  rather   interpreted  within  the  co  text  of  a  complex 
theoret ica 1  structure. 

Considera t  i  ons 

Cheney  and  Morse   (1972)   have  criticized   the  dynamic  approach  to 
assessment  on   three  grounds.     One  problem   is  the  preoccupation  with 
historical  events  often   in  the  aosence  of  any  verifying  data.  The 
second  criticir.m  relates  to  the  emphasis  during  assesr-nent  on  the 
individual's  presumed  unconscious  beliefs^  attitudeSf  motivations,  and 
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f.o    f()rth,    .1'.    I  fit  If  f 't      I'-l    throuql)    [  >  i  o  j  <     t  i  (jrj-i  .      ('[u-rwy   .ind  Mor'w^ 
c'h.irq*-   t  [la  t    t  ti  I      t«M  hniqin'    i bound   inor^*    in    tlu'ory   t  h«in    in   ev  i(!  f»ru'</ . 
ThittI,    iH'h.ivior     i       v.miiiumI    to   l»r   .1   ( 'onsociuf^nco  o!     i  n  t  r  r  n<i  I  i  zi^i 
f  M  t  ho  I  <M|  I  (  .1  I    1  « M  t  u  r  «     .       1  ti  1  r,   .IS  sump  t  ion    i  <  |  n(  >  m  '  s   « 'V  i     *n(N '    show  i  n(|  that 
m<iny  hrh<ivi<u';   .ir<»   situ.ition.i!  sptM'ific. 

The  usr  of   various  psychodyncim ic   indirect  moasurGmcnt  procedures 
h...   dir.Mt     I  np]  I    .1 1  1  .>MS    for         a- l>  i.isod   a  r.  fi(>  r;  r.nuTi  t  .      Thoso  moasurn^ 
('()ntinu<*   to  bo  usrd    in  (  linicil    practicr^  dcMipite  clata    indicatinq  their 
low  pre(Ji(  tive  validity  (cf.   Iforson  &   IJarlow,    1976).^    For  example, 
Golfrioci  and  Kent   (197?)    note  that  although  the   interpretation  of 
certain  siqns  00   the  Bonder-Gestal  t   test    (Hutt  &   Hriskin,   196f5)    has  no 
empiri('<il    '.o^port    (cf.   Coldftiod        Inqlinq,    1964;    Mutt,    196R),  the 
revi^it'(^   version  of    the   IkuuJe  r -Ges  ta  1  t  manual    presumabl  y  d  i  sroun  ted 
these  research  findings  and   still   recommended   the  use  of  questionable 
interpretat ions^     A  rather  extensive  literature  on  the  comparative 
(predictive  )    validity  for    indirect  measurement   techniques  (Mischel, 
1968,    1971)    suqqests   that  predictions  made  on   the  basis  of  self-reports 
are  equal    to  or   superior   to  those  made  on   the  basis  of  indirect 
measurement   techniques   that  are   interpreted  and   scored  by  "clincal 
experts".     These  findings  hold   true  tor  a  wide  variety  of  content  areas 
(of.   Mischel ,    1972) . 

While  there  are  major   problems   in   the  predictive  validity  of 
indirect  measurement   techniques,   responses  generated   in  the  test 
situation  are  also  subject   to  a  variety  of  situational   and  examiner 
influences   (cf.   Hersen       Barlow,    197(3).     Maslinq   (  1960)   docuirented  the 
influence  of  situational   and   interpersonal  variables  and  since  then  a 
number  of  writers  have  further  validated  this  probler.  (e.q.r  Hamilton  S 
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Robertson^  1966;  Harris  &  Masling^  1970^  Hersen^  1970;  Hersen  & 
Greaves^   1971;  Marwet  &  Marcia,   1967;   Masling  &   Harris,   1969;  Simkins, 

•  960^   Tu  r   c         r^o  '  r^-^i  r^v  .    i  a  ^  ■ ;  ^ 

Fe^h^ap^   ..he  nioo-.      -ipo n  u    -i,^::.u^    -   i:^ ...   rms  ci-en   -aised  over 
traditional  dynamic  assessment  is  its  relation  to  treatment.     A  number 
ot  authers  have  noted  that  there  appears  to  be  little  relation  between 
traditional  assessment  and  treatment   (Bandura,   1969;   Goldfried  & 
Pomeranz,  1968;   Kanfer  &  Phillips,  1970;  Peterson,  1968;   Stuart,  1970). 
Thus,  while  traditional  dynamic  assessment  may  lead  to  a  diagnosis 
which  may  in  turn  lead  to  the  recommendation  of  a  particular  treatment, 
diagnoses  resul ting  f ^ ^m  traditional  assessment  methods  cannot 
accurately  predict  what  particular  treatment  mode  should  be  implemented 
(Ciminero,  Calhoun,  &   Adams,   1^77;   Stuart,  1970). 

Psychometric  Test^Based  or  ,The  Psychoeducatibna 1  eProcess  Mode 1 

The  psychoeducat ional  process  and  psychometric  test-based  model 
also  bear  similarity  to  the  psychodynamic  disease  model   in  that 
underlying  processes,  or  specifically  process  deficits,  are  said  to 
account  for  learning  and  behavior  problems.     In  many  respects  this 
model  can  be  ccnsiifered  a  part  of  the  dynamic  model  discussed  above. 
However,  in  contrast  to  this  model,  a  psychometric  approach  is 
characterized  by  the  use  of  a  variety  of  individual  and  group  tests  to 
compare  individuals  along  various  trait-dimensions.  Within 
trait-theory  approaches,  variour:  personality  structures  art  ^aid  to 
account  for  an  individual's  b     jvior   {Mischel,  1968,   1974).  Trait 
theorists  disaoree  on  what  traits  explain  certain  patterns  of  behavior, 
but  generally  aaree  that  certain  behaviors  are  consistent  across  time 
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and  settings  and  that  these  patterns  are  expressions  or  signs  of 
underlying   trai  ts . 

Tp  c  Ofi^  t  r)  F  '    ^r.c)     ]'u    PS  V ' '  h  o<i  vn  f^^'p  ^      'OOf- )  V  \nr' ,    ''   'ai      ^  ;sps*'"pr'^- 
i:.  y  p  i      .1  I  y   ;":<^ive   plcicea  a   lii^n   |.>i:eni  i  uni   'M\   objective:    aornini^^i  Lotion  ana 
scorings  of  tests.     Attempts  have  usually  been  made  to  establish  formal 
reliability  and  validity  of  the  various  measures  used.     On  empirical 
grounds,  this  "statistical"  approach  has  proved  generally  superior  to 
the  more  "clinical  method"  in  predicting  behavior   (cf*   Korchin  & 
Schuldberg,  1981)  ,  but  questions  have,  however,  been  raised  over  the 
manner   in  which  the  research  reflects  the  reality  of  decision  making  in 
actual  clinical  practice* 

Closely  related  to  the  psychometric  approach  is  the 
psychoeducat ional  process  model.     The  model  can  be  considered  analogous 
to  the  psychometric  trait  model   in  that  assessment  focuses  on  internal 
deficits,  except  its  context  is  psychoeducational   rather  than 
personality  or  emotionally  oriented.     Mercer  and  Ysseldyke   (1977)  list 
six  characteristics  of  this  model.     The3e  include:    (3)    the  model   is  a 
continuous  model  based  upon  the  degree  of  deficit  present  within  the 
child,   (b)    the  model  assumes  that  adequate  development  of 
psychoeducational  processes  are  necessary  to  the  adequate  development 
of  academic  skills,   (c)    the  model  is  a  deficit  models   (d)   the  deficits 
or  disabilities  are  viewed  as  exis>   >    g  within  the  child,   (e)  deficits 
can  exist  unnoticed,  and   (f)   the  model  i 3  completely  culture  bound  in 
that  processes  are  considered  necessary  to  the  acquisition  of 
socially  defined  goals  (cf.  Yisseidyke  &  Bagndto,  1976). 

Witi'iin  this  model  exceptionality  can  be  due  to  one  or  a 
combination  of  three  philosophical  positions   (Quay,  1973) .     First  is 
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the  position  that  exceptional  children  experience  dysfunctions  in 
certain  processes  that  are  critical  to  learning.     In  this  regard^  the 

pr  ■!         rrjn^;  i  ^ie*- to   i      w'^thin   tho  child  it   Is  assumed  "that 

fr>P         V  r, >         j[.    .    .-u.'  '  ':     '      y.    '  einediable  r^nc;  ^^/js.    !  •■-  b'.-oassc  'i   or  ^  /ic 
best,  be  compensated   for   "    (Quayr   1973,   p.  166).     A  second  perspective 
on  exceptionality  is  the  exper iential ^defect  view  in  which  various 
dysfunctions   (e.g.,  neurological  organization)    are  due  to  defects  in 
experience,  such  as  in  crawling.     A  third  view  is  that  the  child 
experiences  a  deficit  i-i  which  a  limited  behavioral  repertoire  is  the 
basis  for  learning  problems.     Finally,  these  appproaches  may  operate  in 
combination  wherever  learning  problems  are  due  to  process  dysfunctions, 
experience  defects,  and  experience  deficits   (Yesseldyke  &  Mirken, 
1982)  . 

Since  a  variety  of  cognitive,  perceptual,  psychol ingui st i r ,  and 
psychomotor  processes  or  abilities  have  been  cited  as  causes  of 
children's  academic  failure,  norm-referenced  "cognitive"    (e.g.,  WISC-R, 
McCarthy,   Stanf ord-Bi net) ,"  perceptual"   (Bender  Visual   Motor  Gestalt 
Test,  Developmental  Test  of  Visual  Perception,  Developmental  Test  of 
Visual-Motor   Integration),"  psychol inguist ic  "(e.g.,   Illinois  Test  <>f 
Pschol inguistic -Abil i ties)  ,  and  "psychomotor"    (e.g.,  Purdue 
Perceptual-Motor  Survey)    tests  are  used  to  assess  these  abilities. 

Most  of  these  assessment  procedures  follow  a  diagnostic-pre- 
scriptive approach.     Ysseldyke  and  Mirkin  (1982)  note: 

All  of  th<,  diagnostic-prescriptive  approaches  based  on  a  process 
dysfuncMon  viewpoint  of  the  nature  of  exceptionality  operate 
similarly.     When  students  experience  academic  difficulties  it 
is  presumed  that  the  difficulties  are  caused  by  inner  proces5? 
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dysfunctions  or  disorders.     Tests  are  administered  in  an  effort 
to  identify  the  specific  nature  of  the  within-child  disorder  that 

i     creat-inq  or  contrihutinq   t- o  "earninq  difficulties*  Disorders 

auditory  ^^equential  memory  deficits^  body  image  problems^ 
eye--hand  coordination  difficulties^  visual  association 
dysfunctions^  and  mannual  expression  disorders)  .  Specific 
interventions  are  developed  tu  "cure"  the  underlying 
causative  problems   (p.  3:^8). 
Considerations 

There  are  several   important  implications  that  can  be  raised  with 
regard  to  the  assessment  tactics  used  within  the  process  or 
psychometric  model.     Firsts  since  norm-referenced  devices  are  commonly 
used  within  the  models  the  clinician  must  assume  that  clients  tested 
have  comparabl     acculturation  to  those  on  whom  the  test  was 
standardized   (cf.   Newland^   1973;  Oakland  &  Matuszek^   1977).     Yet  the 
point  has  frequently  been  raised  that  standardized  tests  are  biased  and 
unfair   to  individuals  from  cultural  and  socioeconomic  minorities 
because  they  reflect  predominantly  white^  middle-class  values  and  do 
not  reflect  experiences  and  the  linguistic^  cognitive^  and  other  \ 
cultural  values  and  styles  of  minority  individuals  (Laosa,  1977)*  For 
example r  although  the  norms  for  some  tests  (e.g.,  some  group 
achievement  and  aptitude  tests ^  the  Stanf ord-Bi net ^  1972^  &  WISC-R)  are 
generally  good^  norming  on  other  instruments  k^re  quite  inadequate 
(e.g.,   ITPA,  Leiter  International  Performance  Scale^  Slosson 
Intelligence  Test)  . 

A  second  issue  is  that  research  examining  components  of 
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reliability  and  validity  on  various  process  measures  has  not  been 

c     imistic   (cf.  Ysseldyke^   1973,   1975,   1977;   Ysseldyke  &   Salvia,  1974; 

Salvia  &  Ysseldyke,   1978)  •     For  example,  several   reviews  of  research  on 

1973;    ^edlack  ^   Weener ^    1973)    have  dravn  attention  to  these 
'imitations*     The  magnitude  of  the  problem  of  inadequate  norming, 
inadequate  or   incomplete  reliability  data,  or  questionable  validity  is 
nicely  represented  in  data  presented  by  Salvia  and  Ysseldyke  (1978). 
Clearly, the  potential  biased  assessment  practices  is  high  given  the 
poor  psychometric  properties  of  these  instruments. 

Aside  from  the  psychometric  issues  of  these  assessment  approaches 
(i.e.,  norming,  reliability,  and  validity)   an  important  issue  is  the 
degree  to  which  iliterven tion  programs  based  on  these  assessment  models 
have  been  effective.     A  considerable  amount  of  research  has  been 
conducted  on  ability-    raining  approaches  (see  Ysseldyk^  &  Mirkin,  1982 
for  a  review) .     These  authors  noted  that  there  have  been  major 
challenges  presented  to  optometric  vision  training  programs  (e.g., 
Keogh,  1974),  visual-perceptual  trai>rting   (e.g.,  Hammill,  Goodman,  & 
Wiederholt,   1974),  auditory-perceptual  training   (e.g.,  Goodman  & 
Hammill,  1973),  and  psychol inguistic  training   (e.g.,   Sedlak  &  Weener, 
1973).     Although  the  jury  may  still  out  on  these  various  procedures, 
there  has  been  considerable  compell ing  evidence  that  they  have  not  been 
effective.     Therefore,   the  issue  that  must  be  raised  is  that  these 
procedures  may  bias  the  assessment- intervent ion  process.  We  have 
labeled  this  outcaroe  bias   (see  Chapter  6) . 

Finally,   the  approaches  based  on  the  measurement  of  psychological 
processes  rather  than  by  directly  observable  featur€»s  raises  questions 
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of  bias  in  mental  testing   (Reynolds,   in  ptess)  .     Most  of  the  tests 
described   in  this  section  measure  traits  or  constructs  that  a^ie  not 
directly  observable  and  are  obviously  defined  dif  erently  by  different: 
i  lid  i.  V  id  u 'i  j.  s     a<!^^  .:iS'j]e^'   oii   .v   :,    .1  ^  I .  v      y^:,.  \-,       'J'mus     v.-i  iwu^v 

criticisms  that  have  bc-en  advanced  against  the  trait  apprc^ch  in 
general (e  .g . ,   Kazdin,  1^75)   would  apply  to  these  procedures  .  Per 
example,  one  major  criticism  of  trait  testing  approaches  is  that  the 
score  one  obtains  on  a  test  is  usually  thought  to  reflect  the  property 
of  the  individual  assumed  to  be  measured  by  the  test  (e.g., 
intelligence  from  intelligence  tests,  visual  sequential  memory  from  the 
ITPA,  and  aggression  from  a  projective  test).     As  Tyron   (1979)  has 
noted,  this  sets  up  a  test-trait  fallacy  that  begins  with  the  f'^ulty 
assumption  that:   (1)    test  scores  are  trait  measures;    (2)   trait  measures 
are  basic  properties  of  the  person;  and    (3)    test  scores  reflect  basic 
properties  of  the  person.     "This  sequence  essentially  converts  a 
dependent  variable  into  an  independent  variable;  hence  a  measurement  is 
reified  into  a  causal  force.     It  should  also  be  emphasized  that  the 
unsound  logic  of  drawing  inferences  about  abil i ty  on  the  basis  of 
observed  performance  is  integral  to  the  test-trait  fallacy"  (Tyron, 
1979,  p.   402).   It  is  possible  that  adoption  of  this  "test-trait 
fallacy"  can  lead  to  bias  in  assessment. 

It  may  not  be  useful  to  lump  all  tesLS  together^ and  indicate  that 
they  are  biased  simply  because  they  measure  processes  (Reynolds,  in 
press).     Clearly,  some  tests  are  better  than  others  on  the  basis  of 
certain  ::>sychometr  ic  criteria.     However,  tests  and  testers  that  embrace 
the  prow^ss  model  will  continue  to  have  the  problem  associated  with 
this  model  as  elucidated  above  (see  also  Fiske,  1979).     Thus,  it  seems 
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doubtful  that  addressing  the  question  of  bias  on  a  test-by  test  basis 
will  solve  the  fundamental  problem  of  the  conceptual  model  embraced  by 
these  approaches o  ^ 

Components 

Technically,  there  is  no  one  model  of  behavior  therapy*  Also, 
contemporary  behavior  therapy,  despite  commonalities,   it  characterized 
by  a  great  deal  of  diversity.     The  different  approaches  in  contemporary 
behavior  therapy  include  applied  behavior  analysis  (e.g.,   Baer ,  Wolf^  & 
Risley,  1968;  Bijou^  1970) r  mediational   S-R  model (e.g . ,   Rachman ,1963 ; 
Wolpe,   1958)   social  learning  theory  (e.g.,  Bandura,   1969,   1977),  and 
cognitive  behavior  modification  (e.g.,  Meichenbaum,  1974,  1977; 
Mahoney,   1974a;  Mahorey  &  Arnkoff,   1978).  These  approaches  are  only 
briefly  reviewed  here.     The  reader   is  referred  to  Kazdin  and  Wilson 
(1978)   as  well  as  original  sources  within  each  approach  for  a  more 
detailed  presentation.     The  following  section  is  adapted  from 
Kratochwill   (  1982)  . 

Appl ied  Behavior  Analysis.  This  form  of  behavior  therapy  developed  from 
the  experimental  analysis  of  behavior   (cf .  Day,  1976;  Feister  &  « 
Skinner,  1957;   Sidman,  1960;   Skinner^  1945,   1953;  4*957,   1969,  1974). 
It  emphasizes  the  analysis  of  the  effects  on  independent  events 
(variables)   on  the  probability  of  specific  behaviors   (responses) . 
Applied  behavior  analysis  focuses  on  behaviors  that  are  cl in ical ly' or 
socially  relevant  (e.g.,  various  social  behaviors,  learning  disorders, 
mental  retardation,  social  skills,  etc.)   and  adheres  to  certain 
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metholological  criteria   (e.g.^  experimental  analysis^  observer 
agreement  on  response  measures^  generalization  of  therapeutic  effects) 

Ad'.'ocatps  of  applier!  behavior  analysis  employ  a  more  restrictive 

behavior  tl'v^rapy.     Behavior   refers  to  "the  observable  activity  of  the 
organism  as  it  moves  about^  stands  still,   seizes  objects^  pushes  and 
pulls,  makes  sounds,  gestures,  and  so  on"    (Skinner,  1972a,  pp* 
260--261)  .      Internal  feelings  and  cognitions  are  typically  not 
considered  a  proper  focus  for  the  techniques  of  therapy,   research  and 
practice.     However,   it  must  be  stressed  that  apllied  behavior  analysis 
focuses  on  the  behavior  of  an  individual  as  a  total  functioning 
organism,  although  there  is  not  always  an  attempt  to  observe,  measure, 
and  relate  all  of  an  organism's  response  taking  place  at  one  time 
(Bijou,   1976;   Bijou  &  Baer ,   1978  ). 

Many  intervention  procedures  associated  with  applied  behavior 
analysis  are  derived  from  basic  laboratory  operant  research  ((e.g., 
positive  and  negative  reinforcement,  punishment,  time-out,  response 
cost,   shaping,  fading  stimulus  control,  and  many  others-  see  Bijou, 
1976;  Gelfand  &  Hartmann,   1975;  Kazdin,  1980;   Sul zer-Azarof f  &  Mayer, 
1977)].     Assessment  emphasizes  the  individual  application  of  these 
procedures  and  a  functional  evaluation  of  their  effectiveness   (Bijou  & 
Grimm,   1975;   Emery  &  Marholin,   1977).     Behavior  analysis  refers  to  the 
study  of  organism-environment  interactions  in  terms  of  empirical 
concepts  and  laws  for  understanding,  predicting,  and  controlling 
r^rganism  behavior  and  repeated  measurement  of  a  well  defined  and 
clearly  observable  responses   (Bijou  1976,  Bijou,;   Peterson,  6  Ault, 
1968;  Bijou,   Peterson,  Maris,  Allen  &  Johnson,  1969). 
V 
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Neobehaviorist ic  Mediat ional  S-^R  .Model  >  The  neobehaviorist  ic 
mediational  S-R  model  is  derived  from  the  work  of  such  learning 
theorists  as  Pavlov^   Guthrie,   Hull,   Mower ,  and  Muller   (e.g.,  Eysenck, 

char:ac  ter  i  zed  by  '"^the  application  of   the  princi[/les  of  conditioning, 
especially  classical  conditioning  and  counter-^conditioning  to  the 
treatment  of  abnormal  behavior"    (Kazdin  &  Wilson,   1978,  p,  3). 
Although  intervening  variables  and  hypothetical  constructs  play  a  role 
in  assessment  and  intervention,  covert  activities  are  most  comrttonly 
defined  in  terms  of  a  chain  of  S-R  reactions  with  cognitive 
formulations  de-emphasized. 

A  number  of  treatment  procedures  such  as  counter-conditioning  and 
systematic  desensi t i zat ion  have  been  used  to  treat  anxiety  reactions, 
phooic  patterns,  and  other  stro'^g  emotional  disorders  in  children 
(Morris  &  Kratochwill ,   1983).     Systematic  desensi ti zat ion ,  based  on  the 
principle  of  reciprocal   inhibition  (Wolpe,  1958),  has  been  successfully 
used  to  treat  a  wide  range  of  child  and  adult  problem  behaviors  (cf. 
Bandura  1969;   Paul,  1^69  b,   1969c;   Rachman,  1967;   Paul  &  Bernstein, 
1973).     Assessment  within  the  mediational  S-R  model  relies  on  survey 
schedules  (e.g.,  fear  survey  schedules  )   and  self-report  data,  and 
direct  measures  of  client  behavior  (as  in  the  use  of  behavioral 
avoidance  tests) . 

Cognitive  Behavior  -Therapy*  Many  of  the  procedures  subsumed  under  the 
rubric  of  cognitive  behavior  therapy  evolved  outside  the  mainstream  of 
behavior  therapy  (Kendall,  1981).     A  unifying  characteristic  of  the 
cognitive  behavior  therapy  approach  is  an  emphasis  on  cognitive 
processes  and  private  events  as  mediators  of  behavior  change.  The 
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source  of  a  client's  problems  are  said  to  be  related  to  their  own 
interpretations  and  attributions  of  their  behavior,  thoughts,  images, 
self-statements,  a.id  related  processes   (Kazdi.       Wilson,  1978). 


rational    einuUxvc    i^iiotapy,    Week's   coqniti.^e   i.ii-r  cifjy  .    and   Me  iciionbaum' s 
sel  f- ins  true  t  ional   training.     Treatn^ent  strategies  are  quite  diverse 
(cf.   Mahoney  &  Arnkoff,   in  press,   Meichenbaum,   1974,   1977)   and  include 
such  techniques  as  problem  solving,   stress  innoculation , 
self-instructional   training,  coping  skills  training,   language  behavior 
therapy,  thought  stopping,  and  attribution  therapy.     These  techniques 
represent  procedures  not  generally  addressed  by  other  behavior  therapy 
approaches   (e.g.,   applied  behavior  analysis)    and  in  some  cases 
emphasize  components  of  a  given  technique  where  the  interpretation  for 
its  efficacy  is  yet  to  be  resolved   (Kazdin  &  Wilson,  1978). 

Assessment  in  cognitive  behavior  therapy  has  tended  to  be  quite 
broad  based  taking   into  account  many  different  dimensions  of 
"behavior".     Yet,   there  is  still  an  emphasis  on  defining  the  nature  of 
the  target  problem  whether  this  be  overt  or  covert.     In  some  cases, 
more  traditional   functional  analysis  of  behavior  which  emphasizes  a 
careful   examination  of  environmental  antecedents  and  consequents,  as 
related  to  a  certain -response  repertoire  are  explored  (e.g., 
Meichenbaum,   1977) . 

Some  specific  purposes  for  cognitive  assessment  have  been  outlined 

by  Kendall    (1981) : 

1 .     To  study  the  relationships  among  covert  pheit'omena  and  their 
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relationship  to  patterns  of  behavior  and  expressions  of 


emotion. 
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2-     To  study  the  role  of  covert  processes  in  the  development  of 
distinct  psychopathologies  and  the  behavioral  patterns 
associated  with  coping. 

To  r  •  r  <  r  i  r  rn    ^  h  ( '      ^  f  c ^    t  .'^'^    :  >  f    t  r      f-  rn  r  m  t 

■I'o   ct\:  V!'cJ\i-s    v/h^/-''^   -.'-.qiM     ■  ■r;    ■'         (.:>  ■:i^^-f 

manipulated  or   implicated   in  the  effects  of  the  manipula- 
tion  (pp,   3-4) • 

Some  specific  aspects  of  cognitive  behavioral  assessment  are  discussed 
in  Chapter  ?• 

Social  Learning  Theory,     Social  learning  theory  is  based  on  the  work  of 
Bandura  and  his  associates   (e.g.,  Bandura,   1969r  1971 r  1974,  1977; 
Bandura  &  Walters,  1963)  and  has  evolved  considerably  over  the  past  few 
years.   Bandura   (1974)    initially  noted  that  "contrary  to  the  mechanistic 
metaphors,  outcomes  (i.e.,   reinforcing  events)   change  human  behavior 
through  the  intervening  influence  of  thought"   (p,   859)  .     More  recently, 
Bandura   (1977b,  1981)    has  also  noted  that  in  addition  to  outcome 
expectation,  a  person's  sense  of  his/her  ability  to  perform  a  certain 
behavior  mediate  performance.     Bandura   (1977b, 1981)   refers  to  these 
latter  expectations  as  efficacy  expectations  or  self-efficacy,  and 
therefore,  suggests  they  have  important  implications  for  intervention. 
Psychological   treatment  and  methods  are  hypothesized  to  produce  changes 
in  a  person's  expectations  of  self-efficacy,  as  in  the  treatment  of 
phobic  behavior,     Sel f-ef f icac  is  said  to  determine  the  activation  and 
maintenance  of  behavior  strategies  for  coping  with  anxiety-eliciting 
situatioris.     Self-efficacy  expectations  are  also  said  to  be  modified  by 
different  sources  of  psychological   influence,  including 
performance-based  feedback   (e.g.,  participant  modeling),  vicarious 

J 
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information  (e.g.,  sympolic  modeling),  and  physiological  changes  (e.g., 
traditional  verbal  psychotherapy)    (cf.   Kazdin  &  Wilson,  1978). 

Intervention  procedures  su    i  as  symbolic  modeling    (e.g.,  Bandura, 

{Bandui'o:,    IS//;    i-o  so  n  i  ■  i  :  I  ,       .^'/^      '^^7«0     -    .i /rA    i^i'  i  f    i.odtrlinu    (nrc-d,  <.i 
Brody,   197";   Micklich  &   Creer ,    1977)    have  been  associated  with  the 
social   learning  theory  approach.     For  example,  modeling  has  been  used 
to  treat  a  variety  of  children's   fears      (Moris  Sr  Kratochwill, 
1983;    (e.g.,   animal   fears,   inanimate  fears,  dental  and  medical   fearsl  , 
socially  maladjusted  children   (e.g.,   social  withdrawal,  aggression), 
distractibil i ty ,  and  severe  deficiencies   ((e.g.,   autism,  mental 
retardation)   cf.   Kirkland  &  Thelen,   1978)],  as  well   as  a  wide  range  of 
academic  behaviors   (cf.   Zimmerman,  1977). 

Social   learning  theory  stresses  that  human  psychological 
functioning   involves  a  reciprocal   interaction  between  the  individual's 
behavior  and  the  environment   in  that  a  client   is  considered  both  the 
age^      as  well  as  the  target  of  environmental   influence,  with  assessment 
focusing  on  both  dimensions  of  behavior. 

Unifying  Characteristics'.     Despite  apparent  diversity  among  the 
different  areas  within  behavior   therapy,   several  dimensions  set  it 
apart  from  traditional   forms  of  psychological  assessment  and 
intervention,   particularly  the  test-based  psychometric  models  and  . 
psychodynamic  models.  Contemporary  behavior  consists  of  the  following 
characteristics: 

(1)  a  strong  commitment  to  empirical  evaluation  oi:  cueatment 
and  intervention  techniques. 

(2)  a  general  belief  that  therapeutic  experiences  must  provide 
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opportunities  to  learn  adaptive  or  prosocial  behavior. 
(3)     specification  of  treatment  in  operational  and,  hence, 
r e pi i cable  terms  • 

;Kazdin  &   Hersen.    1980,   p.  287;. 
Behavior  therapy  has  become  very  diverse  and  now  includes  a  number 
of  therapeutic  strategies  that  were  once  excluded  from  the  field  (e.g., 
rational  emotive  therapy).     Although  these  characteristics  are  tied  to 
the  therapeutic  aspects  of  the   ^ehavioral  approach,  each  can  also  be 
conceptually  representative  of  the  behavioral  approach  to  assessment. 
In  the  sections  that  follow  the  methodological  and  conceptual   issues  of 
behavioral  assessment  are  outlined. 

Beha v ior  Therapy  Approaches  to  Assessment  of-Learning  and  Behavior 
Disorders . 

♦ 

Behavior  assessment  has  evolved  considerably  in  the  past  few 
years.       The  foundation  has  been  laid  for  behavioral  assessment  within 
the  area  of  social-behavior  disorders  (e.g.,  Ciminero  et.  ai.,  1977; 
Ciminero  &  Drabman,   1978;  Hersen  &  Bel  lack,  1976)   and  there  has  been 
attention  directed  toward  assessment  of  learning  disorders   (e.g..  Bijou 
&  Grimm, 1975;   Kratochwill,  1982,  1982;   Lovitt,  1975a,   1976b,  Ross, 
1976)  . 

Applications  of  Behavioral  Assessment.  Behavior<il  assessment  can  be 
used   in  treatment,  selection,  and  research   (Goldfried  &   Lineh^n,  1977). 
Its  application  to  determine         ♦lures  of  the  person  and  environment 
that  maintain  deviant  behavior   is  one  of  the  most  common  uses  (see 
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Bellack  &  Hersen,  1978;  Goldfried  &  Davison^  1976;  Goldfried  & 
Pomer anz  ,   1968)  • 

Behavioral   assessment   is  n  ]  so  U'^ed  for   selection  purposes  wherein 

i    ;  .     :  •  i;  ■ .  ■  ■  ■  ■  1  on 

treatment.    Behavioral   assesr-ient  has  not  been  applied  extensively  in 
this  area    (cf.  Goldfried  &  Linehan^   1977;   Wiggins^   1973)^  but  there  is 
increasing  work   in  developing  measures  which  allow  prediciton  of 
treatment  program  success. 

Finally^  behavioral  assessment  is  used   in  research.  Assessment 
and  design  methodology  have  been  an   identifying  feature  of  behavior 
therapy  a. id   its  scientific  basis  where  empirically  validated  principles 
and  procedures  are  used  for  the  systematic  evaluatic  ;  of  clinical 
interventions   (Bellack  &   Hersen^   1978).     Although  this  research  base 
has  not  been  limited  to  one  methodology^   a  large  amount  research 
employing  behavioral  assessment  strategies  has  been  conducted  through 
single  case  experimental  methodology^  ^specially  in  applied  behavior 
analysis   (cf.  Hersen  &  Barlow^   1976;   Kazdin^  1982;   Kratochwill^  1978)* 
A  major   feature  of  assessmej^t  within  behavior  therapy  single  case 
research  is  the  repeated  measurement  of  the  target  response  (cf.  Bijou^ 
et  al.^   1968;  Bijou^  et  al  .  ^   1969)^  and  on  cognitive^  motor  and 
physiological  dimensions   (discussed  in  more  detail   in  Chapter  7)  •  As 
1    tlected   in  the  features  described  by  Kazdin  and  Hersen   (1980)^  an 
emphasis  is  placed  on  direct  measurement  techniques   (actual  measures  of 
the  target  responses  through  the  three  content  areas) ,  rather  than 
through  indirect  measurement  (e.g.^  projective  tests^  perceptual  motor 
scales^  personality  inventories^  etc.). 
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Sorae  Distinctions  Between  Behavioral  and  Trad i t iona 1  Assessment >  There 
are  numerc.s  conceptual   and  methodological  differences  between 
behavic   al  and  traditional   assessment  but  the  major  differences  eminate 
from  the  underlying  assumptions  that  each  approach  adheres  to  in 
characterizing  human  performance.     It   is  even  possible  that  the  same 
assessment  tec  .liques   (e.g.,  criterion-referenced  assessment,  direct 
observation)    could  be  used   in  both  traditional   and  behavioral 
assessment.     The  various  ways  writers   in  th-?  behavioral  assessment  arei 
have  contrasted  behavioral   and   traditional    [trait   (psychometric)  or 
state   (dynamic)  j    approaches  to  asse|^Jbent  has  been  compiled 
by  Hartmann,   Roper,  and  Bradford  1979)   and   is  presented  in  Table  3.1. 
Behavioral   assessment  is  uj  ^ally  cnaracter i zed  by  relatively  fewer 
inferential  assumptions  about  personality,   remaining   instead  closer  to 
observable  behavior.     As  noted   in  the  previous  section,  most 
non--behavioral   approaches  to  assessinent  and   treatment  conc^:  Ive  of 
behavior  as  relatively  stable  and  enduring  and  relate  learning  and 
behavior  disorders  to   internal  processes  or  characteristics. 

In  behavioral   assessment   inferred  causes  of  a  disorder  are 
bypassed   m  favor  of  a  careful  environmental   analysis  of  the  problem 

nd  observable  skill  deficiencies.     Moreover,   the   rntervention  program 
would   typically  focus  on  specific  skill   training  irather  than  or. 
underlying  process  reniediation.     These  approaches  nave  sometimes  oeen 
identified  as  a  skill   training  approach   (Ysseldyke  ^  Mirkin,  1^32). 
Thus,  within  a  bi      .ioral  assessment  framework,   in  is  useful   to  viev;  an 
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Table  3. I 


Tabic  1.  Piifer.nccs  Hetu::r:n  Bdi:]virr;u    u1  TradiMo.nal  Approaches  lo  Assessment 

Traditional 


I.  Assumptions 

1.  Conception  of 
pcrson3lit>' 

2.  Causes  of  be- 
havior 

II.  lniplicati( 

1.  Role  Oi  ochavior 


1,  P  'c  -'f  history 


3.  Consistency  of 
>^ohavior 


III.   Us-  data 


IV.    Ot^er  vh:u3Cterisiics 
1         ^1  of  inferences 
Conripari'.'^i^r 

3.  Methods  of 
assessment 

,  Tim'ng  of 
assessment 

5.  Scope  of  assess- 
mon  t 


PeiSonal'ty  consuuciv  rnjinly 
cn-'ploycJ  to  sumriKui?.^  specif- 
ic be)i:*v'or  p3!terrs,  if  at  all 

Maintain.':!,-:  Cvrv'aifriT^  i,.-uglii  in 
current  .  rivtnt  ■rnc.'»'. 

Ininortant  as  .1  ^jr;ip!  ■  of  icrs-  w\ 
rrpe:  r^'.::  'f.  ^^^c-Lift :  .ition 

Rci.Uiv  :'v  :.ini'.i;'0»''.  int,  f  ■  opt, 
fo/  CKampiC,  'o  pr.v,Mr  a 
retrospci  '>vi-  ^ 

Behavior  Iricugh!  v:-  t  e  specific 
to  llic  si: ;'>:■. 

To  dcscii'j^  J      't  bt'fraviors  and 
maint  conditions 

To  selr  ci  uii:  ^ippropri"  te  treat- 
ment 

To  ev.MU>'.c  and  ,ense  trcalinent 
Low 

More  emphasis  on  intraindividua! 

or  iJtographic 
More  emphasis  on  direct  methods 

(e.g.,  observations  of  behavior 

in  natural  environment) 
More  ongoing;  prior,  during,  and 

after  treatment 

Specific  measures  and  of  more 
variables  (e.g..  of  target  behav- 
iors in  various  sif^atio;)'-,  of 
side  effects,  context,  sucn-ths 
as  well  as  dfifu-ioncics) 


Personality  as  a  reflection 
of  enduring  underlying 
stiites  or  traits 

Intrapsychic  or  within  the 
individual 

Behavior  a^sumcb  impor- 
laricc  only  insofar  a<;  it  in- 
dexci  underlying  causes 

Crucial  in  that  present  con- 
ditions seen  as  a  product 
of  the  pa?t 

Behavior  expected  to  be 
consistent  across  time 
and  settings 

To  describe  personality 

functioning  and  etiology 
To  diagnose  or  classify 

To  make  prognosis;  to 
predict 

Medium  to  high 

More  cmpha'-.is  on  intcrii- 
dividud  or  nomothetic 

More-emphasis  on  indirect 
methods  (e.g.,  inter- 
views and  self-report) 

Pre-  and  perhaps  posttreat- 
mcnt,  or  strictly  to  diag- 
nose 

More  globa'j  measures  (e.g., 
of  ci:re,  or  in»provcmcnt) 
bui  C  '^ly  of  individual 


rSource:  Hartmann,  1).    -,  Roper,  B,L.,  5  Bradford,  D.C,  ^ome 

relationships  betv.ecn  behavioral  and  traditionai  assessment., 

journal  of  Behavioral_As scssment ,   197;,        3-21,  Reproduced 
by  permission) . 
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individual's  learninq  as  one  would  view  the  acquisition  of  a  spe-^ific 

set  cf    skills   ( and  >  converse  1  y  ^   a   learning   problom  as  a   set  of  sp(:f  I'.Jc 

skill  (Jef  icienc  ies)  . 

A  distinction  made  by  Goodenough   (1949)   between  a  "sign"  and 
"sample"  appro. ich  'o  test   interpretation  has  o^ten  been  snqqested  as 
another  dimension  on  which  to  distinguish  bet%/een  traditional  -^nd 
behavioral  assessment    (Goldf r ied ,   1976  ;  Coldfried  &   Kent,   1972).  When 
test  responses  are  viewed  as  a  sample,   it  can  be  assum€?d  that  they 
parallel   the  way  in  which  a  child   is  likely  to  behave   in  a  nontest 
situation.     When  test   responses  are  viewed  as  signs,  an  inference 
is  made  that   the  performance   is  an   inairect  manifestation  of  some  ottier 
characteristic.     This   feature  is  demonstrated   in  the  previous  section 
wherein  we  noted  that  within  traditiona'    assessment  a  child   is  said  to 
demonstrate  low  or  poor  performance  on  tne  visual  perceptual  memory 
subtest  of  the   ITPA,  wherein   it   is  assumed  that  underlyirg  visual 
perceptual  processes  may  be  impaired.     Such  an  emphasis  on  sign 
approaches  also  promotes  determining  the  '^.i^erlying  causes  of  academic 
and/or  social  problems-     Behavioral   assessuient  places  less  emphasis  on 
historical  conditions  and  so  such  factors  as  developmental  history  is 
of  secondary  importance  <Haynes,   1978).     But  when  historical  factors 
are  considered   in  behavior  analysii"     they  are  examined    In   terras  of 
interactional  history  where  ph'sical  a.td  social  conditions  result  in  a 
wide  range  of  behavioral   repertoiries   (Bijou,   1976;   Bijou  &  Baer , 
1965) . 

Viewing  academic/learning  problems  within  the  traditional  or 
behavioral  approaches  has   important   implications   for  test  development 
("test"   is  used  broadly  to  refer   to  a  variety  of  assessment 
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procedures).     Goldfricd  and  his  ass;ociates   ^ColdfriGd,    1976;  Goldtried 
S,  Linehan,   1977;   Coldfried   f.   Kent,   10,2)    provided    >  conceptual 
framework  for  contrasting  these  opposing  models.     Within  traditional 
assessment,   the  nature  of  the  situation   in  w!uch  th-    individual  is 
functioninq    is      uially  of    less   interest   in       sesstnennt   than  arc  such 
factors  as  the  dynamic  or  structural  components.     Within  a  behavioral 
orientation  the  skills  conception  of   learning  problems   imolies  that 
comprehensive  and  carefully  sampled   task  requirements  be  reflected 
within  one's  assessment.      Tn   this  context,   the  conventional    notion  of 
content  validity  of   the  test  becomes  particularly  crucial,   since  one 
must  obtain  a  representative  sample  ot   those  situations   in  which  a 
particular  behavior  of  interest   is  likely  to  occur.     Thus,  in 
assessment  of  a  learning  problem,  this  includes  both  the  content  of  the 
test  per   se ,   as  well   as   the  situation   in  which  the   test   is  administered 
(Bijou ,    1976) . 

Models  of  Behavioral  Assessment.     Several  models  of  behavior  assessment 
have  evolved  and  are  l.sted   in  Table  3.2.     These  models  reflect  the 
diversity  that  exists   in  contemporary  behavior   therapy  as  well  as  the 
mov<fo.^"!    -owards   inclusion  of  more  cognitive  factors  in  assessment 
(Krato-h-.v:  .  L  ,    1982).     The  basic  S-H  n>.udel  expended  by  Lindsley 

(1964)    to   include  stirr.ulus   (S),   response    (R)  ,  contingency   H')  ,  and 
consequence   (C) .    [the  "S"   refers   to  the  antecedent  e     nts  or 
discriminative  stimuli,   the  "F:"   refers        behavicrn,  the  "K"  represents 
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Model 


Model  s  of  nehav  10:    ^    A^;';i  ^, anient 
-Source  .  . 


S  -  R 

S  -  R  -  K  C 

A  -  t:   -  C 

S   ^  O  -  h    -  C 

T        PA  -   PI    -  PE 

BASIC   -  ID 


Ferste-    (  1965) 

Skinner   (19  ) 

Lindsey  (1964) 

Stuart  (1970) 

Kanfer  &   Saslow  (1969) 

Kaiiier  &  Piillips  (1970) 

Goldfried  &   Sprafkin  (1^74) 

Bergan   ( 1977) 

lazarus   ( 1973) 


Sourct-:    Kratochwill,   T.R.   Advances   in  behavioral  assessment.    In  C.R. 
Reynolc.s  ond  T.B     Cutkin   (Kds.)    Handbook  of  school  psychology.  New 
York:   John  Wiley  &   So  1982. 
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various  contingenc  .  r   schedulers  of  reinfor  ecment)  ,   and   the  "C 

denotes  the  consoor  _s  of  the  behavior  (e.q.,  presentation  or  removal 
of  positive  or  negative  reinforcement)).   An  A-B-C 

( anteceden ts-behav ior s^consequences)   model  was  proposed  by  Stuart 
(1970) •     An  even  more  expanded  model  was  proposed  by  Kanfr      ^nd  Saslow 
(1969)    (see  a'  Kanfer   &  Pt.illips,    1970)   who  added  an  O  to  expand  to 

a  S-O-R-K-C  formulation.      Similarly,   Goldfried  and   Sprafkin  (1974) 
presented  an  expanded   S^O^R-C  model   for  a  behavioral  analysis. 

A  commonly  used  model  of  behavioral  assessment   is  the   Kanfer  and 
Saslow  (1969)   scheme  which  includes  seven  specific  components: 

1.  An  ini  tial  -analysis  .of   the  ^problem  ,   i  tua  t  ion   in  which  the 
various  behaviors  that  broo'jht  the  client  to  treatment  are 
speci  f  ied ; 

2.  A  clar if icatxor  of  the  problem  situation  in  which  various 
environmental  variables  (e.g.,  stimuli  and  responses)  are 
specif  ied; 

3.  A  root,    ational  analysis   in  wli^ch  reinforcing  and  pu*"  •  '^hir^ 
stimuli  are  identified; 

4.  A  developmental   analysis  in  which  biological,  sociological,  and 
behavioxial  chrnges  o:  potential   relevance  to  "h^  treatment  are 
iJen ti  f  ieJ ; 

5.  An  analysis  of  self-control   in  which  the  situations  and 
behaviors  the  elicit  can  control  are  identified; 

6.  An  analysis  of  social   situations  in  which  the  interpersonal 
relationships  of   individuals   in  the  client's  envircnnent  and 
their  various  aversive  or   reinforcing  qualities  are  specified? 

7.  An  .analysis  of  the  social-cultural  .physica  1  enva ic onment  in 
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which  normative  standards  oi   >(>t^v7ior  and  the  client's 
opportuniti(?s   for    support  aro  evaluated. 

The  Kanfer  and  Saolow  (  1969)   model  c.>n  assist  the  professional  in 
clarifying  problem  behaviors  and  elucidating  environmental  factors 
related   to   the   target    problntn,     .Several   por.     ive  features  of  this 
system  are  appare-t   (Cuninero  &  Drabman,   1978).   Unlike  many  systeois, 
the  S-O-R-X-C  model   includes  many  components  ignored  by  other  models 
(e.g.,  biological,  soc  i  a  1  -  r-ul  tur  al  ,  reinf  orecment  history, 
developmental   factors);   the  model   focuses  on  positive   (assets)    as  well 
as  negative   (deficit)    h  '.aviors;   in  the  behavioc  analysis  tradition, 
the  model   i^   • nd iv id ua 1 ^ zed  for  each  client,  thereby  increasing  the 
probability  of  an   individ.,:^     «>-^a  tment   for  each  client. 

Despite  these  posiii    r-  ?si  oc  .s,    "he  Kanfer  and  Saslow  (DfO) 
system  should  be  used     n  v  j  i-.->xt  of  three  considerations 

(Kratcc'-i  'V  ir«82)  Firs-L,  whil3  the  n.odel  purportedly  provides  the 
assessor  •  •  '■  ?  sy.-tematic  framework  for  ^jatnerino  data,  t-.he  methods 
for  931'  ;  data  ir^:t  be  determined  somewhat  sabjectively.  .Second, 

the  model  does  not  p.   vide  a  "scientific"  approach  to   interpreting  data 
collected    (cf.  Dickson,   1975),  or  selecting  ar.  appropriate  treatment 
strategy  (cf.    :xmv-,.-ro,   1977).     Finally,   the  model  does  not  provide  a 
model   for  evalu.   ion  of  the   intervention  plan   (Ciminero  f.  Drabman, 
1978).     Thus,   the  professional  must  develop  a  mee    urement  system  to 
evaluate  on   intervention  program. 

To  address  these  considerations,  systematic  research  which 
c    .onstrates  that  certain  behavioral  assessment  procedures  (e.g., 
interview,  self-report,  direct  observaton,  etc.)   are  reliable  and  valid 
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is  a  hiqh  priority   (Kratorhwi  11  ^  .      In   the  area  of    selecting  a 

treatment  strateqy,  efforts  are  just  beginning  uo  selort  some  correct 
matches   (ct.   Ciminero,    lj/7).      One  promising  approacl.  has  been 
presented   by  Kanfer  and  Grimm   (1977)   who  proposed       differentiation  of 
controlling  variables  and  behavior  deficiencies  into  categories  that 
can  be  matched  with  available  intervention  strategies.     These  five 
categories  and   some  sub-components  whic\  are  used   to  organize  cl,:t-n: 
omplcii  Its  presented  during  an   interview  assesfnor.t  are  present*  J  in 
Table  3.3.    /ihile  the  accompanying  change  procedur   s  are  qui  te  general  r 
the  "match"  can   lead   the  professional   into  areas  wnere  various 
intervention  programs  have  been  quite  snrcesful   in   the  past.  With 
regard   to  es  tabl  i  r.l- i  5  7  a  particu\ar   intervention  strategy^  an 
evaluation  .    n  be  inducted  through  a   functional   analysis   (e.g.^  Bijou 
&  Peterson,   19;i;   1-iou  S  Grimm,   1975;   Gardner,   1971?   Petersor,,  1968). 
These  features  would  include   (1)    a  systematic  observation  of  the 
problem  behavior   to  obtain  a  buS?'  ine,    (2)    systematic  observation  of 
the-  stimulus  conditions  followir,  ;   and/  or  preceding   the  behavior,  with  n 
special   emphasis  on  antecedent  discriminative  vues  and  conseque'^  t 
reinforcets^    (3)   experimental  manipulat  on  ot  a   treatment  which  ppears 
functionally  related  to  the  problem  behavior,   and    (4)  fUi.rher 
observaLi >a  to  record  behavior  cha^qes   (Peterson,  1968). 

J 

'""lis  functional  analysis  strategy  bears  similarity  to  research 
design  procedures^  but  does  not  imply  that  applied  research  being 
conducted.     Credible  research  would  require  further  mithodoloqical 
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l\i!>le  3  .  > 
A  !?(^hav  ioi  al   Aiialy.s  if.  lu 

'3ehavi.oral  Deficits  ^ 
A.     Inadequate  Base  of  Knowledge  for  ('a i ding  Behavior 
Miiirf*   ro  Knj'.ajv'         A ,  t  •  j>y>  t"  ah  1  ^  •  Rahavinrr; 
Due  to  Skills  Deficits 

C.  Inability  to  Supplc-^n.t.  or  Counter  Immediate  Environmental  Influeuc.  >  and 
Regulate  One's  heh. /ior  Through  Self-Directing  Responsr 

D.  Def  iciencies  in  ::elf -Re  iuiorceinent  for  Perl^rmance 
F..      Deficit in  M( 'ii  i.to  r  i  n^,  Ouc'r.  Own  Behavior 

F.  luabilil     lo  .  ILer  Rcf.poiuic.  in  Conflict  Situation.^; 

G.  LL-nited  Behaviou  Repertoire  Due  to  Restricted  P.r.ngc  of  Reintorcers 

H.  Deficits  in  Co,;nitive  and/,  rllotor  Behaviors  Necessary  to  Hoc  the 

:  i.aids  of  Paily  Living 
iiehavi"     1-  Excesses 

A.  Conditional  Unappropriate  Anxiety  to  Objects    or  Event- 

B.  Excessive  Self -Observat ional  Activity 
Problems  in  Environmental  Stimulus  Control 

A.     Affective  Resr-onse  to  Stimulus  Objects  or  Events  Leading  to  Su^  jective 

D*        -ss     of  Unacceptable  Behavior 
3^  rrj    ;ffer  ^^upport  or  Opportunities  for  Bcaavio>  .  Appropriate  in  a 

P:  ~it  Milieu 

C.  Fa:  lure  to  Meet  Environnental  D-    and:^  or  Responsibilities  arising.  fr<^^ 
Inefficient  Organiiiat  ion  of  time«^ 

Inappropriate  Self-Generated  Stimuius  Control 
'a.     Self-Descriptions  Serving  as  Cues  for  Behaviors  Leading  to  Negative 
Outcomes 

31  J 


li.  Vi  I        /Syi;'!)  >  i  i  i    Ac:ivlt.y  SriSMur,   to  Cur    ( iKipp  rop  r  i  .it  llrh,i\/i.)» 

C.  Faulty  Labeling;  oi    liM  .  nial  Cuer, 
V.     Inappropriate  Contingency  Arransemont 

A  ■  ■ .  1  i  1  n  r' -  •   '  ■'    '  i  p     I  '  i    ;  r .  ^ it  m  .  ■  n  ;     t  <  i   '-u  m       i'  1     u  -   >  ;  >  ^      ■ '  *     '  W  ■ !  i ,  r '  !  d  i 

[\^  Kir         li    utcil   Ma  LuL<>n  -lu  V'  of   Untl    .Lrable  1'  -liavirvr 

C.  i:xc(-;iiive  uno  of  ra.sa..ivc  Rein  1^  ^  ( m 'm^-i t    for  D'-siral    e  Uch.u'u^rs 

D.  Delivery  of   Re  ir  ^Orcr'inent  Independent  of  Responding 
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rcM^»i  f ^'^'^  !imitiMi    to,    ,*  Mi  it  cliMiirMt 

inv<ilulit/   thii\jt.s    (J       torhwill    f.    lM<^r:u'l,    in  f)rt'ss). 
c  ons  i(U.  t  at  i  (ill;; 

There  has  btM-n  a  f>ht'noiiu.»na  I    amount   of   writinq    in   t        .»r(N\  of 
behavioral   assossmont  qoneraiiv   but  relatively   little  of   thi.s  work  has 
focused   C!:   t'vorvr    T(-i(Mroh.         '    tM.^rMiro    in   bias    in   r^hilrl  Ix-hnvioral 
as.se.ssment  .      Hooks   providin      i     cU5;sion  of    i';j;i)es   r(>li*vant    to   t  h( 
assf^ssment   of  children    (e.q.,      iminero,   Palhoun   &   Adims,   1977;   Cone  & 
Hawkins,    1977;   Herson  &   Bellack,   1976,    19H1)    and  chapters  that  focus 
lusiv^  ly  on   the  assessment  of  childr(*n   (e.q.,   Ciminero  Orabman, 
19      .    KvaM.s  K    NeKson,    1977  ;    Kratochwill,    19R?)    have  virtually  no 
mention  e<    b*.i  ,  or   noi'    r  sc  im  i  na  tor  y  assessment. 

A  number  of   issues  can  be  raised   in  child   behavioral  assessment 
that  have  a  oirerct  bearing  on  assessment  bias.     A  major   issue   in  the 
field   is  d,  fininw   what  behavioral   assessment   is.     A  readinq  of  the 
rt-cent   literature  on  behavioral   assessment  will   clearly  show  that   it  is 
remarkably  diverse  and   is  becoming  even  more  diverse.     A  major  reason 
for   this   is   ^hcl  behavioral  assessment   is  part  of  the   larger  domain  of 
behavior   therapy  which   is   known   to  be  extraordinarily  diverse  in 
th-^orctical   appro   jhes,   research  methods,  and  therapy  *:vch   ' -ues  (ct* 
Kazdin,    1979).     Behavioral   assessment   '-as  always  been  closely  linked 
with  the  deve:  :..pment  of  beh*        r  "hcrapy,    ( Kr a tochwi  1  1  .   1982;   Mash  & 
Terdo-      l?:a)    (e.q.,   applied  behavior  analysis,  meadiational  S-R 
aupro     nes,   rr   ial   learning   theory,   and       initive  behavior  modification 
'  (Kazdin  &  Wilson,   1978)    .     Each  of  the    ireas  of  behavior  therapy  has 
tended  to   i^vlude   its  own       cessment  techniques       J  procedures 
r  .  flective  c  l   the  theoretical   position  advanced  is,  fundamental 
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As  s«  *  s  sn^  <  *n  t     "  s 

cJ  i  f  1    r  *  Mu-i       n  1    a  r» s«     'wth  *n  t       t  <   s  t  <     n » h    '  c  cx-t  ii r  r  « ■<  1        i  i » s ♦  h«  m  >  i  » *  t  j  <  m  1 

I  >r  (). «    Im'n   within    t  lir    f       1  d   ni    f><»h.iv  i  or    t  lu'rapy  .    S  i  net*   1m 'hav  i  o  r  .  i  1 
,  I       'sr       I  J    has   <|  I  ()  wn    to    i  rw  I   i  I « *    s  i  J(  '  i   a   1 1  I  v  f  *  r     *    .u  •  a  y   of         r     i  i  <      >  • .  ,    t  n<  • 
sLu(ly  of    hi. IS    in   hotinviotal    a .of  .«:hi<  m  i     x'cofno".   .i   fPLitif.M     ;  od  (jsk 
which  hor,   yet   to  bo  propf^rly  exp' 

A   S(U'onci   <in(i   rolatod    is.sno  conrofn*^   tho   .ictual    t(»c'hniqiu>s   thnt  aro 
to  ho   o  '  >  •  i  ( 1 » •  *  o(J   part    of    holi.ivioral     i  s  s«  *  j;  srnon  t  ,      fUsiriru]    in  min*    t  hes^' 
variati  'r^.    im    th«;or^  .icmi    approac^ho     within  behavior   thera^>y,  the 
ii./'nh(>r   ol   di    fi^rent    techniques  and  procedures   subsur^^d  und  a   the   i  jric 
of   oohaviotril   assessmcmt    is  qrowinq    incredibly   larqe,      Oofiniriq  what 
behtivioral    «i  ssf    smo' i      i      now  and   what    it  will    b^     .n    the      nture  will 
lik.  ly  l>o  dot  Mil  mod   by  ovolvinq   tdu^oretKMl    |   t  ■.|>oo  t  i  vt-s  rather 

than  a  cone     a  la  1    appr^'ich  mappinq   a   uniform   sec    of    techniques  and 
pro<  ed  ur '  s ,      Intieed^   attempts   to  define  behavioral   assessment  have 
typicc.ily   loeused  on  the   theoretic^^     differences  between  traditional 
and  bohavioral    ar^'^roaches   (e.g.,    Hartmann^   et.   a^,    1  979;    Nel-.on  S 
Hayes,    lS/9)    wMh  behavji    ral    perspective  beinq    identifio^  by 

.certain  conceptuai    (   -ar  a -^ter      ^  ir [(e.*.^,,   assessing  many  mod    'i  ties, 
giving  primary  (nnphasi      to  behavior,  conside^ring  assessment  a 

samplt    of    behavior,   an.ong   others    (Kazdin  &   Hersen,    T980)).     Yet,  with 
diif-  rent   th^^oretical   pe   spectives  on  wha^    is   to  be   included  wi  ti  the 
domaia  of   behavior   ther   nv,   attemr  ts   to  define   the   field  of  behavioral 
assessment  will   bec^nno  more  ditfi^  Tit:  when  comparis    iS  -^"^e  maoe  with 
S0"Cal led   traditional  approaches  , 

'^ohavioral   as^''^  r^>smen  t  has  u   so  been  said   tc  embrace  a  conceptual 
approach  that   involves  a   prooler.  solving  strategy   in   the  assessment 
process  rather  than  the  use  of  ^    set  ct  specific  measurement  strategies 
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(  Kv       .        tii  \  ',^>..  ,     i  'i     '  ,    M.I    !i   K    'l\  '  r  .  i    ;  ,    TUi  1  )         T  .  i      cm  m  •  • « p  !  : ,  .  I    , ,  f  >p  r  ( h  l» 

i<iri'i«'   (       ; tiM  hinqin*      .r^i    t  >  r  ( x  •<  •(  I  u  r  » ' : .    th.?*    <\ui   Ix-    iih  Iu(i<Ml  in 

Ix'h  I      (u.il    .iss«-.Mni'     „      H<'(M  V  loi  .1  1    .IS',,    .<jniMi(    h.rs    uMi.iMy  h^M'U 
chiir.n't  <'r  1  /0(1   a«;  (^onr»  i  !;  t  i  rui   of    f^onn-  (j(»r)f»r.il    donuiins   (7  f    «i:»S(*';sm(»ri  t  , 
inc:lu(iirH|    ititrrvu>w,   f.i- 1  f  -  r  (>po  r  t:  ,   chocklists  cM\(]   rdtinq  r.calo!^, 
^    ■  M    n;u i  I  I  t  1. 1  i     }  ,    a  i  i  ;  :      j  li .  :,.!U  •  n  I  ,    .  u  u  i    1 1  ,    oh     ^  i  v  « 1 1  i  cm.  t  i    lui  ^ ; .  \  i  r  r  :i 

(cf  .   (^(UM,  H),      Y<         it   f^pf)<^lr^;   clfsir    thii    ♦ '  x  pd  n(i  i  ncj    Iramcworks  ul 

chil(i   b)f^lKJvi«.r        .i.s.S(  :  sm*  it   within   the  context  of  a   problem  solvinq 
cipproaoh  /jIIow  virtnnMy    jny  tochniquo  or   tost   to  r)e  considerod  as 
child   hch.^v  i  or  .1 1    .  i .  s  f^s  smon  t  .      For   oXcimpl(>,    t  r  .id  i  t  i  ona  1    proj  •    t  i  vf»  tost 
to       it:,   itiiijht    ;m'   nsf'd  tln^y  provide*    lul^r    .itiori   on   .i  lent•^» 

coqnition.s  or   r  t  *  mi  f  m  cor    pr(»  frr  onces  or  c(;<|nitive   sty      .  Also, 
traditional    tf»nts  could  bo  conceptualized   as  a   lormat    lu  provide 
standardize  '  measures  (j1   skill       'rformance,   such  as   in   the  area  of  IQ 
tostirn    (o.q.^    Nolson,    19Rt^)    or  achievement  and   percent  i.il   motor  tests 
(f^.q    ,    Nash  &    V>    rial,    !0B1),    md   neur  ol  oq  ir   1    assessrr,  :^n^  «^  (♦^'.q., 
Goldstein,    1979).   Su   ^  a  diver  r;ity  of   techniques  makes   it   pratic  ly 
impc  .siblo   to  '^peak  of  bias    in  beh   vioral   assessment   in  any  meaningful 
Woy.     Rather    i\    vould   seem  more  api/ropriate   to  address  bi.^s  in 
behavioral   assessment   at  a   ltL.'vel    spocifir-   to   t^r^   type  ass^r-ssment 
instiunif  rU:  o     technique  employed. 

A  ^^—y]    is5;ue   that  has  bren   the  source  of  activity  in       o  /ield 
3nd  \.       ^      He  r-^-^lved   relates   to   t     »  psychometric   featur  'S  of 

beht  ^    .  '    -^ssment.     Many  child   beh/Wi  -r     assessment   :^trateqies  might 

be  regardec  as  6otentilly  biased  based  on  the  lack  of  st  iardized 
features   in  assessment.     Possibly  di:e  to  a  r.-jv^  t-ion  of  traditionil 
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'M  •  r i  t    .  I  f  >( )  r  < ) . K  ■  I H  ■ .  I  n  <    f  ;  I'  1  r  >  1 1  >  ■<  1   iih  ■  ']U •  'r  >  t    < 1 1 i  i  ( 1 .  I  i  r n  -  s      ,n.i ny 

t '  ;  t  i  H  r  I  »•  •  yi  h(  )inf  •  t  r  I  »  •  H  u  r  ^  (if  '  i  '  <  I  im  v  i  <•  r  .  i  I  .  i  .<  *  s  sriu  mi  t  (  *  M  •  r 
1 1  o  r  ml  n  <  I  ^    r  ♦  •  1  i  .  t  M  i  1  i  t  y  /    v  i  i  i         /    .  i  »  -  r  ■  i  1  i         =  1  i  '     )    1 1 .  i  v  < »    n .  >  t    b » •  r  *  n 

.uli'cju.i  t  <•  1  y   .i(l<lt  (    U,irtm*inn,  <i  1  .  ,    P      »;    M.i*.h    K    •l'rr<i,il,  P)H!). 

Ml   th<*   .irtM   of    normiruj,    tor   t^x  iinf)lt^,   conc^M        hnv<'   hern  rdised 
MMjardiriM    t  h(»   r.ith(»r   nmhiquou':  rTi(*aiun(|   of    nwinv  .i  s  r.r  ?;  smf    t  s  'Ooducted 
w   t  li  chiMriMi   wittiniit    corK-rrti^   r  ( ^  t  ( '  r  i  Mu  •<  >    -n  imiii'  '     '      «   'Mnlly  cietw).^? 
po*ul»)tion  of    (  hilcirt^n   or    tlu^  odv  i  roniv(^»n  t    ^u   wh  t],»v   aro  be  i  nq 

ass  -fiJ^od  .      This  h/Ks   r.iiscHi   concerns   reqtirdinq  .;:opLicite  use  of 

d.itci    from  child   bohdvioral    assessments    in   cl  t  i        and  in 

ostablifhinq    sfi<     ili^-  qo<jlj;    for    i  n  t,    r  v  on  t  i  •  In    arid  i  t  on, 

1  I J  '  r  o   h.  I  /  o   b(  M  Ml    r  ^ '  I  .1 1  i  v< '  1  y    few    i  ri  v  es  t  i  q  <i  t  i  or  ,  n  i  n  i         tin-    r  r  1  i  ab  i  1  i  t  y 

of    child   bohavioral    <tssessmont   tochniouos.  ii...ioral   assessors  have 

boon  primarily  concerned  with  establishinq    inter-observer   aqreement  on 
various  response  measures,   but   have   tended   not   1~o  e5    ablish  the 
reliability  of   many  assessment   techniques   usinq  conventional 
psychorru  ■  t  r  1  c  <     iter)  a  developed   for   this  endeavor.     Moreover^  tho 
validity  of    issessment,    includinn   such  areas  as  construct, 
cr i ter i on- r el  a  ted  and  content  validation^   has  many  times   failed  to 
appear    in   the   field.      Although  many  behavioral   assessors  have  focused 
m;   9uch  are<ts  as  co'^t«.      validi  the   ^    oc^*^  "   thro     h  wh  i   h  even  this 

has  becM    acc^^      ;     ,:h-^   has  map^'  t^n.es  been    ini  ^rmal,    inadequate,  and 
visu-^My    in      nplr^^e    (cf  .    Hartmann,   et   al.,  1^79} 

De      '    pmq   standardised  measures  has  becoaie  ^»von  more  correal  ated 
due   to  cont  ^^'-versv  over   traditional   psychometric  concepts.  Cone 
(1981)^   for  example^  arqued   that   future  work   in  the  behavioral 
assessment   field  must  focus  on  a  paradigm  radically  different  than  the 


ERIC 


Assessment  Hi  as 
93 


traditional  psychometric  models  for  establishing   reliability,  validity, 
and  qeneralxzability.     He  noted   that  behavioral  assessment  procedures 
are  based  on  a  different  conceptual  model  of   individual  variability 
than  traditional  approaches.     Thus,   the  traditional  psychometric 
approaches  may  be   inappropriate   for  behavioral  assessment.     As  an 
alternative,   he  proposed   that  accuracy  be  the  primary  method  for 
establishing   the  credible  psychometric  dimensions  of  future  assessment 
strategies   in  the  field.     Based  on  these  issues,   it  is  not  at  all  clear 
what  specific  types  of  psychometric  procedures  will  be  established  for 
behavioral   assessment   techniques  or  even  how  the  field  will  deal  with 
devices  and  procedures  that  already  meet  some  conventional   psyche  letr ic 
cr  i  ter  ia  . 

Regardless  of  what  criteria  for  establishing  the  validity  of 
behavioral  assessment  strategies  is  finally  decided  upon,   it  would  seem 
appropriate  that,  consistent  with  the  study  of  traditional   test  bias, 
validity  criteria  be  employed  as  a  framework  for  studying  bias.  The 
question  posed   in  the  study  of  bias  in  behavioral  assessment  would 
remain  the  same  as  that  employed   in  the  study  of  test  bias:    Is  the 
assessment  procedure  '-qually  valid  across  groups? 

A  fourth  issue  that  has  been  a  source  of  some  concern   in  the  field 
relates  to  the  feedback  pratitionerF  have  been  providing  regarding  the 
actual  practices  of  child  behavioral  assessment   in  applied  settings 
(e.g.,  Anderson,  Cancelli,  &  Kratochwill,   in  press;  Swan  &  MacDonald, 
1978;  Wade,   Baker,  &  Hartmann,   1979).     One  finding  has  been  that 
behavioral  assessors  have  tended  to  use  a  great  number  of  traditional 
assessment  devices.     Wade  et  al .   (1979)   found  that  nearly  half  of  their 


respondants  who  were  members  of  the  Association  for  Advancement  of 
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Behavior   Therapy  (AABT)    used   traditional    inteiviews  and  a   larqi*  number 
of  projective  and  objective  tests.     Such  factors  as  agency  requirements 
for  prescribed   test  use,   requirements   for  testing   involving   label inq 
and  classification,  and  a  reported  difficulty  with  implementing 
behavioral  assessment  in  applied  settings  were  offered  as  possible 
reasons  for   this.     These  results  cor resporic^ed   to  other  assessment 
practices  of  school  psychologists  reported  by  Anderson,   et  al.  (in 
press)    who  found  that  select  behav ioral ly-or iented  members  of  the 
American  Psychological   Association  (APA) ,  Divison  16  and  the  National 
Association  of  S'chool   Psychologists   (NASP)   employed  traditional  testing 
procedures  and  devices.     As  vith  the  other   is.sues  cited  above,  this 
issue  further  compl  icaLv.,^  the  st;udy  of  bias  in  behavioral  assessment. 

The  issues  raised   in  the  preceding  paragraphs  cpnvey  something  of 
the  issues  that  have  been  raised  in  child  behavioral  assessment.  These 
issues  reflect  only  some  of  the  more  general  concerns  that  have  emerged 
but  by  no  means  do  they  represent  a  comprehensive  overview. 
Nevertheless,   the  concerns  that  are  now  being  examined  in  the  field 
will   likely  continue  to  influence  both-  research  and  practice  for  some 
time  to  come.   Hopefully,   the  issue  of  bias  will  receive  adequate 
attention  in  this  area. 


Sociological  Deviance  .Model  .Components.     Ma  1 adapt  ive  behav  ior  has  often 
been  conceptualized  within  both  a  medical  and  social  deviance 
framework.     From  a  medical  perspective,  maladaptive  behavior  can  be 
studied  in  the  same  way  as  other  forms  of  illness#  while  the  deviance 
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p>erspuctive   locuseji  on  ma  1  :ir'a[)t  i vo  bc^hvivior  a*^;   the  hroa^kincj  of  social 
rules   (Des  Jarlais,   1972)  • 

The  term  "devianco"  i  a  relatively  rr:cent  one  used  to  describe 
maladaptive  behavior,  although  the  breaking  of  social  rules  has  been  a 
topi^  of  study  for  mr^ny  years  (MacMil  -an,  1977).  Terms  previously  used 
in  reference  to  this  arr^a  have  inciu^'  '  crime,  social  patholoqy,  and 
social  problems.  While  there  have  be*  it  several  theories  of  deviance 
formulated  over  the  years  (cf.  Des  Jarlais,  1972),  this  section  will 
focus  on  labeling  theory. 

During  the  1960*s  a  theory  of  deviance  referred  to  as  labeling 
theory  gained  popularity.     The  label ing  process   is  of  primary 
importance  within  this  theory.     A  major  premise  within  this  method  is 
that  groups   identify  wiLl.  and  have  different  expectations  for  their 
conformists  and  deviants.     While  conformists  are  not  expected  to  break 
social  rules   (Des  Jarlais,   1972),   it  is  also  assumed  tliat/the 
expectations  and  evaluations  of  others  can   influence  an  individual's 
behavior  with  regard  to   following  or  breaking  social  rules. 

Viewing  maladaptive  behavior  from  within  a  sociological  deviance 
perspective  raises  at  least  three  questions  (Des  Jarlais,  1972,  1978; 
Szaszv   1969)  . 

(1)  What  behaviors  are  considered'  m-jladaptive  and  by  whom> 

(2)  -    What  social   factors  are  related  to  conformity  or  rule- 

breaking? 

(3)  What"  are  the  relationships  between  those  who  enforce  social 
rules  and  those  ^ho  break  the  rules? 

Rule-breaking  is  generally  viewed  ar         iation  from  some  norm* 
With  reference  to  social  norms,   it  is  not  an  easy  task  to  articulate 
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exactly  what   the  lu^rm  n>.»y  be   for   a   number   ol    rear.ens.      One  prr^bhnn  Jn 
dofininn   serial   norms   is  that   t  hr^  starv,.lards   for   accer'tahle  bef^vi*:)! 
often  v.try  arro^f.   time  and  'lenqraphic    locMtion.     Wheth^'r   or   not  tvrc^ 
unmarried  adults  livinq   toqether  will  be  viewed  as  violatinq  some  norm, 
for  example,  may  depend  on  when  the  behavior  occurs   (in  1945  or  1975) 
and   where   (a   small    town  or   a   lar(H~uTban  art»al* 

Another  obstacle  to  clearly  defining  social   norms   is   that  given 
the  same  behavior,  there  may  be  little  agreement  cn  exactly  what,  if 
any,  norm  has  been  violated.     Homosexual ity,  for  example,  may  be 
regarded  as   illegal,    immoral,   sick,  or  simply  as  an  alternative  sexual 
preference,,  depending  upon  the  observer. 

S2as7.   (1969)   contended   that   in  attempting   tc  define  social  norms 


we  c,an  assume  on'y  that  they  consist  of  psychosocial,   legal,  and 
ethical  components.     Behaviors  which  are  considered   to  be  maladaptive, 
then  ,\  might  be  thought  of  as  those  behav^iors  which  violate  some 


Labeling  theorists  have  attempted  to  unravel   the  relationship 
between  rule-breaking  and  deviance.     Merely  breaking  rules  does  not 
automatically  lead  to  becoming  a  deviant.     Rather,  a  person  must  be 
labeled  deviant  before  the  expectancies  which  activate  the  deviant  rol^ 
come  into  play  (Des  JarJais,   1972).     Examples  of  this  process  might 
include  commitment  to  a  penal    institution  and   placement  in  a 
self-contained  special   education  classroon:i « 

Lemert   (1962)   made  a  distinction  between  primary  and  secondary 
deviance.     Pr imary  dev lance  refers  to  the  initial  breaking  of  social 
rulesr  while  r u'lse-break ing  that  occurs  after  one  has  been  perceived  as 
a  rule  breaker   is  termed  secondary  deviance  (not  finding  employment 


psychosocial,   leg^l^   and/or  ethical  standard. 
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hoc. Ml.'. I'  ot    ,1   hi;;t()i'V  oi    pi  .jc.'incn  i     in   pr  o<j  r  ■^"ti'^;   tor    rot.ii'h^l   por'UM\-:  is 
one-  iMtainpl.'  of  soconcJary  (Jov  i  .inrf)  •     ')t  her    l.iboinq   thpoiistri    (o,q  , 
H.-.'kci  ,    l')f.n    liinitrd   t  lu-   use     f    the   t  o  nn  ricvi^nct^   to   situations   '  > 
whicJj  riocial   ex  i>'  c- ta  t  i  ons   for   rule--breakiriq  existed. 

A  deviant   label    (e.n.,  mental   retardation  )   does  not  always  Lollow 
rule -»>reakinM    (e.q.,   Merrer,   197'M.      Tn  fact,   in   the  maiority  of  cases 
the  rule-breaker   is  probably  not   labeled.      Rul e-break i nq   is  common  to 
everyone;   yet,   not  all    r  ul '?-breaker  s  are  labeled  as  deviant. 
Undoubtedly,    there  are  many  individuals  who  break  social   rules,   but  v/ho 
art?  not   labeled,  because  their   rule-breaking   is  undetected    (e.g.,  child 
abusers).      In  other  cases,   however,    individuals  who  are  known   to  be 
rule-breakers  may  escape  the   labeling  process  entirely   (Becker,  1963; 
Scheff,   1966).      It   is  also  possible  to  become  labeled  without  having 
broken  any  social  rules,   through  association  with  or  being  related  to  a 
labeled  person,    for  example   (Sever,  1970). 

Labeling  theory  emphasizes  the  role  of  those  who  have  the^ 
responsibility  for  enforcing  social  , rules   (e.g.,   the  court  system, 
psychologists,  teachers,  parents).     These  individuals  and  groups 
initiate  the  labeling  process.     They  have  responsibility  for  deciding 
who  will  play  deviant  roles,  that  is,  who  will  be  punished,  treated,  or 
rehabilitated    (Des  Jarlais,  1972). 

Many  factors  are  involved  in  whether  or  not  those  who  enforce 
social  rules  will  or  will   not  confer  a  deviant  label  on  the  r u( e 
breaker.      Included  are  the  need  of  the  society  to  have  deviant  roles 
filled   (e.g.,   Farber ,   196^),   the  frequency  and  visability  of  the  rule 
breaking,  the  tolerance  level  of  the  society  for  rule  breaking  (e.g., 
Szasz,  1969),  the  social  distance  between  the  rule  breaker  and  those 
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who  t*x(*rf         iioci  :\    control,    t  ho   rc^lativi.'  powi^r  of  tht^   rul(  -hr«   ikor  in 
tho  systci,   ttio  amount   of  conflict  botwoen   the  r  ul  o-hroakt^rs  and  aqcnts 
of   srx'i.n    rontrol,   and  whc^thor  or   not    anyone*  h\r>       r»pc^f:i.il    interest  in 
f.niforc'inq  porialtiojj   aqainr^t   tho  rule  breaker    (De   Jarlais,    1972*.   p.  300, 
1978).     There  are  also   instances   in  which  an   indi    idual  must  be  labeled 
in  order   to  receive  services,  as  occurs  under   the  quidelines  of  Public 
I.a\/  94- M  2. 

Deviance  in  Childhood.     Deviance  in  children  has  not  received  as  much 

 1  

empirical   scrutiny  as  deviance   in  aouits.     Des   Jarlais   (1978)  has 
pointed  out  that  there  are   important  differences   in  the  study  of. 
deviance   in  children  and  deviance   in  adolescents  and  adults.     One  major 
difference   is  that  while  adolescents  ana  adults  are  generally  expected 
to  know  the  social  rules  and  to  comply  with  them^^  children  are  not 
always  expected  t^  have  developed  knowledge  of  social   rules.     The  study 
of  deviance   in  childhood,   then^    focuses  upon  how  qhildten  learn  the 
skills  and  attitudes  to  follow  s6cia-l   rules,  or  how  children  become 
socialized    (Gold  &  Douvan ,   1969)  . 

Children   in  American  society  are  exposed   to  many  different 
soci al i za t ion  agents .     Lippitt   (1978)    has   identified   10  types  of 
socialization  agents,  all  of  which  attempt  to   influence  the  development 
and  values  of  children: 

(1)  the  schools;^ 

(2)  organized  religion; 

(3)  leisure  t^me  agencies  with  recreational,  cultural,  and 
character  educat ion  programs? ; 

(4)  the  police  and  courts; 
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(S  )      t  ho   t:  hc*r  rip<»u  t  x^'  ^   spr  v  •  i  a  1    ro  i  r  ec  t  i  or)  ,    <i  nd   r  (      mh  1  i  /,» t  i  <  >n 

r 

serviros   (o,q.,   social   workers,  counsolors,   pro(jr«ims   for  the 
handicapped)  ; 

((>)      omployrnetit   offices  and  work  supervisors  of   th(  yourvi; 

(7)      political   leaders  who  rOiiy  have  an   investment    in   involving  the 
young   in  political  acti^  i  ^s; 

(H  )      r>.u  ♦^n  t  s  ; 

(9)  pe>3rs; 
(Iflf)      the  mass  media 
At  times,   there  may  be  different  and  contradictory  doi-/itions  or 
standards  of  acceptable  behavior  amon^  and  within  these  groups*  Thus, 
ambivalent  expectations   for  behavior  may  mo   imposi^v:  on   the  child, 
causing  .stress  and  often  deviant  beha^-iors  such  as     owered  ac/^demic!r 
performance,  hostility,   truancy,  and  witi^drawal* 

Criticisms  of  Labeling  Theoiry*     Several  weaknesses  have  been  ^'etected 

in  the   initial  work   in  labeling  theory*     Some  sociologists  (e.g., 

Matza,   1969;   Gove,   197(?)   have  noted  the  relative  lack  of  ^ni^^asis  given 

to  the  role  of  the  rule-breaker   in  the  process  of  becocr.ing  deviant,  as 

compared  to  the  contributions  of  the  c '.rts  of  social  control*  Before 

the  dynamics  of  the  labeling  process    :an  be  fully  underst  od,  the 

actions  of  both  rule-breakers  and  rule  enforcers  need  to  be  delineated 

further*  ^  ^ 

A  second  criticism  of  labeling  theory  relates  to  the  outcomes  of 

becoming  labeled*     At  issue  here  is  whether  or  not  being  labeled 

necessarily  results  in  negative   :>erceptions  and  expectancit^i  or  in 

\ 

deviant  behavior,  as  is  often  iravntained   (e.g*,  Dunn,  1968).  Some 
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,)nthc)r5;    (^^M»    the   volnnti'  <'(iit<(i    \yy  Covo,    IM'/Sh)    ^mv*-  iim  i  n  t  ,i  i  rv  « 1    th.it    t  h«» 
label  irv}   proer*:;*;    it  .'iclf    can   .ictiKilly  p.  t^v^Mit   drvi.mt    <icts  by  «  thcr 
loadirK]   to  (*ft<'(^t  ivo   intervent  ion   for    t  tie   lci.>elof]   por5.on  or    thr'»u(|h  «^ 
(Jet  er  r  <'iu*(^  t'tft'ct.      Tho    respons*-   of    r^rojioni     t  s   of    the    tr;-ory  (i-.q,, 
Becker,  Kitsuse,    197S;   :;chur  ,    197S)   has  been  that   labelinq  Jieory 

is  not  an  attempt  to  describe  the  etioloqy  of  deviant  acts,  but  is 
rather   a    trnnu^work    f  r  ■  >iti  which   to  vit^w   t.hf^   ruMions   oi    <ill  persons 
involved   in  5;jtuations   in  whi^  \  certain  behaviors  and  persons  are 
perceived  as  norm  violators.     Conclusions  on  the  outcomes  of  labeling 
on  the  lab'^led  person  are  mixed  because  studies   in  this  area  are  often 
plagued  with  methocVlog ical   problems   (Gardner,   1966;   Jones,  1973), 

A  third   issue   in  labelinq   theory  is   related   to  the  irreversibility 
of  the  labelinq  process.     Once   labeled,  does  a  person  remain   in  a 
deviant  jole?     Robins   (1966)    presepted  evir^ence  that  most  children 
labeled  as  deviant  become  conformists  as  adults.     Gove   (1970)  also 
argued  against  irreversibility,  citing  the  number  of  mental  patients 
who  are  released,     MacMillan   (1977)   contended  that  because  different 
demands  and  expectations  are  made   in  different  settlings   (e,g.,   school  r 
home),   the  notion  of  irreversibility  will   not  always  hold,     A  person 
viewed  as  behaviorally  disordered   in  one  setting,   for  example,  may  not 
^ be  considered  deviant   in  other  settings. 

These  criticisms  have  clearly  reveled   that  1  abel  inq   theory  is 
incomplete,  bet  possibly  not   invalid    (MacMillan,   1977).  Feture 
research  may  very  well   focus  on  factors   involved   in  L»oth  the  direct  and 
indirect  effects  of  becoming   labeled,  and  on  racial  .and  cultural  biases 
involved   in  determining  who  becomes  labeled.     The  labeling   issue  ;s 
currently  receiving  attention  as  part  ot  the  debate  over  present 
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f  )r  ()( d  u  M  t  < )  I  1  v.M  > .' ,  ui' )  .» rn  i  (  1  .  r  ►  • ,  U  /  i  rK  i  (  In  1 '  1 1  ♦  mi  I  u  r  ^  p<  m  i .  <  1  i  •«  1  «i<  .i  t  I  on 
!;(»rv       's.      ThM   work   of    fV^rci^i    (IM/H,    IM/I,  n  <1(  »<  nfruwi  t  t  «f  1  t 

prc)(M*5;.s  whuM  rl)y  soinc  (^luldt^n  .u<^  i  (i<»n  t  i  f  i  (m!  «ir)(l  ( •  1  .i  ^is  i  !  i  »hJ  mortally 
rcM^irch'd,  x  «iin  f  )1  f  ,   witliiri    t  ho   f)uhli('   school!*.    Morc(*f's  work 

h  i(|hl  ifjlU  <Mi   th        i  Jiprofjor  t  ion.itf  lUiml        of    hi. irk   .hhI   sf>.in  i  sh -sur  name* 
children   that  were  hoinq    lahelleHl  mont^illy  retarded    in   the  Riverside, 
ralifr)rnia   f>nhlir   ^.chool  s   ^lin;:;   raisinq   rortof^rn   over   potential    bias  in 
the   l<ihelinq   f)r  oce^ir; .  i 

[>abelinq  and  ca  teq  or  i  za  t  i  on  ,   it  has  often  been  arquod,   is  a  useful 
way  of  qroupinq  people  who  need  similar   treatment.     Yet,   ca teqor i za t ion 
or   label  inc]  of  some  people   (e.g.,   handicapped  persons)    often  does  not 
result    in   academic    improvement   and  c^an,    in   fact,    lead   to  negative 
consequencM^s   such  as  seqreqation   from  non-hand icappped   peers.  Clearly, 
the  .current   labels  and  categories  within  the  fields  of  psychology  and 
education  need  to  be  further  examined    (cf.   Reynolds  &  Balow,   1974).  In 
addition,  much  more  work   is  needed  on   the  direct  and   indirect  outcomes 
of  labeling  and  mislabeling  ar^d  on  whether  or  not   the  outcomes  of  being 
labeled   in  one  setting  generalize  to  other  settings   (MacMillan,  1977). 
With  respect  to  bias,, such  research  also  needs  to  focus  on  the  possible 
differential  effects  labeling  may  have  across  groups. 

Ecological  Model  Components^     Ecology  is  the  study  of  the  interactions 
between  living  organisms  and  their  environment.     A  more  precise 
definition  was  provided  by  Odum   (19S3),  who  referred   to  ecology  as  the 
study  of  the  structure  and  f unotion  of  nature.     Structure   includes  a 
description  of  the  living  population,    including   life  history,  number 
and  distribution  of  all  species  in  the  system,   the  composition  of  / 
non-living  things,  and  the  conditions  under  which  the  population  lives. 
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(^nv  1  r  ODUH'iit     (  Ki'iKi  <in^i  ,    I     /  ^  )  • 

i:rolo(,ic..l    th.-orists  h..v,-  ..tt.-mplcMl   to   sys  »  .-m..  t  i  c.  1  1  Y  c.t  ...or  i 
l.,.h..v.ors  <..    sp.M  u-s  wntun        -ir   .-nv  .  r  on.n- -n .  s  .      In   .uUUruu.,    p,.tt  Tn. 
of   »,ehaviot    which  .urcount    for   .uiapt...  t  i  on  or   m..  1  ,.<i.„>t    t  i  on   to   »  h.> 
environment  have  also  been  examined    (Feaqans,  1972). 

A.-<-or<iinq    to   Ilolman    (1977)    the   field  of  human  ecoloqy   is*  founded 
in  three  areas:    1)    plant    and  animal   ecoloqy,   2)   qeoqraphy,   and  1) 
Itudies  of  the  spatial  distribution  of  social  phenomena.     Holman  noLed  ^ 
the  lack  of  agreement  of  basic  tenents  and  principles  of  the  ecological 
approach  amonq  ecologists,  primarily  because  tni s  area   is  pursued  by 
individuals   from  many  "d  i  f  for  en     disciplines  and  perspectives. 
Rogers-Warren  and  Warren   (1977)   observed   that  "the  meaning  of  ecoloqy 
is  still  evolving"    (p.  4)   because  psycholog  i^sts ,  educators,  and 
sociologists  who  share  the  term  ecology  have  focused  on  different 
aspects  of  relationships  between  behavior  and  environments. 

Researchers   in  the  area  of  human  ecology  have  tried  to  follow  the 
systematic  and  precise  classification  procedures  of  biology  and  have 
evolved  methodological  procedures  to  collect  and  analyze  data.  This 
type  of  r/search  is  especially  difficult,  however,  because  of  the 
number  of  variables  which  must;,  be  unraveled   in  order  to  examine  hu,,an 
environments.     Nevertheless,   this  type  of  approach  can  add  to  our 
understanding  of  behavior  within  a  variety  of  settings. 

Ecological   psychology  is  not  a  new  concept.     Kurt  Lewin  used  the 
term  "ecological  psychology"   in  a  paper  published   in  1951.     He  referred 
to  ecology  as  the  ir       action  between  psychological  and 
nonpsycho logical  factors.     Later,  Roger  Baker  d968)   used  this  same 

.  ■  > 
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,  1, .  f  I  f ,  I  t  I  (  Ml     J  M    .m    .lit.  tup  \     to    f  .  »  r  Mill  1   .  t  ,       I    t        « .  I  V    (if     :  <  •  ii , »  V  1  -  I     •  »  •  •  '  M|.|       <  .  r 
<•(•()•. y.  t  «  -11  ■.  .       fuitK'T      um!    hi:*   .  •    -.or   i  .  i  t  .  * .    (fMrk^r    >.    :i(  I  m    i  m  •  n  ,  1^''; 
Wick<-l  ,  Will  ^in-,  ,     I'M.  /)     t  i  r  »-l    to  t  i  Im-     uk!    ■   1  .«     .  i  t  y    t  !       v,M  loir. 

< V  1  I  <  »n;tH  Ml  t       or    o'  o',y.  t  o"i of    .\n    l  :i<  I  i  \'  i  i  n.i  I     i  '.y.t^-MMt  |  -  wiy. 

,S()H."'   ox.impli       of    Imni.m    oco'.y.tom      wo'iid    mm  Iu-Io    i    \).\x\y,  ■\ 
c;  1  ii!;«;r  oom   scttinq,   or   .i  moctin(|.      lui'^krr'?.  work   .showc<l    t  h.U  tl)f»*;o 
<M'0«;y.  t  .  nr.  (Mn    iiifliioiu'r  l)olKivi()r    in   two   Wciys  :      Thtoutih   t  hi;  [)hy'iiiMi 
I       1  1  1  t  I         inclU'WMj    \i\    ,1   f).ir  t  1  cul  <ir    o( -(jf.y.  t  oiii  ,    .itid    tlitou^jh   t  Ih- 
orcnipirit!;,  liumar:    inllurnci-.    fuosrnt    i  n    t  h(»   l)(»h.ivi()r    s(>ttinq    ( ,Sm  i  t  h  , 

Noi5,worth,   &  Grcor  ,  1978). 

F»ro!;h,in!;ky ,    Ttt(»l5;()n,   iin(]   Rivlirvi    (197CMf   workinq    in  montnl 
ho!.pitai5.  ,   (iov(- 1  ofM'cl   f)r  i  nr  i  fd  o.s   al)out    Ix'liavior    in   sottinfjs  or 
iH*05;y5.t  (»'ns   b.r.rd   on    IlirKiT*!.   af)pr  o.k'Ii  .      Thoy   found    t  Im  t    o  nv  i  r  o  r^^  ♦  mi  t  n  1 
elements   such  as   space,    administrative  quidelinos,    furniture,  and 
number   of  people  hfjd  a  qre^c  deal   of   influence  on  the  behavior  of 
patients.     Examples  of   some  of   their  conclusicas  are: 

Human  behavior   in   relation   to  a  physical   settinq    is  endurinq 
and  consistent  over   time  and   situation;   therefore,  character^ 
istic  patterns  of  behavior   in  a   setting  can  be  documented  and 
justified;   changes   in  these  cha r ac t er i s i tc  behavior  patterns  of  a 
physical   setting  can  be   induced  by  changing  the  physical ,   social , 
or  administrative  structures  which  define  that  setting  (Feaqans, 
pp. ,   1972)  . 

Proshansky  et  al.    (1970)   used   this  approach  to  modify  the  behavior 
of   institutionalised  patients  by  changing  physical   objects   in  their 
ecosystem   (ward).     Gump,   Schogge^',  and   Redl    (  1963)    found  that  the 
behavior  o^  a  disturbed  child  di ffered  markedly  across  physical 


ERIC  ^     ,  ^ 


7 

107 


'  I  I  tfl  •         1  ' 


'\  \\k  •     I  t'l ;  .  I     f     i  •  t       >     !  ■     >  I  1 V  1  •    r  '  J   I  '  1  1      <  •  -1      I      '  ,  '         J  "  !     ■  '  *    1   )  ' 

»  •■"  r  i  J  '    I  t  I  '  . ,  .  .     I     I     [  '  .  y  •   !  1  '  I  -  •  ,  J      t  J      t  '  i :  !    '  •  !  i .    .  1  t  J    M  I  I  I  t     ■  (        1 ;( t  i  M   .        ^  M 

V  .  I  r  1  1 1  >  I  !  M  V'   of    ftp-   }  M  ^1 , 1 V  I  ' » r  ' .   < »  (    i  > :  i  <  •    i  f  i  - !  i  v  i  •  I  m  >  1  t    i  ■  i  * .  • .   <  1  i  I  f  <  •  i  »  m  i  t 

•'<ttirM''    fiHTii'.    ♦ '  X  .irn  UM  t  1  on  .       It    fn.iy  mi  -  r  <  i '<  p  i  i  t  « '    (i.*'.,    hiiSffi)  to 

<         '  r  V  1  n  I    Mh  •  nn» '   h«  •  f )  1  V  I  o  r     i  ri    <  >  I  lit  r        ■ !  t  M)'  i     ,    v/  i  t  h   •  1  i  f  I  r  r  ♦  -  n  t    <  n<»  *.  , 

iTM  t  4*  r  i  .  I  *  *.  ,  pf'opli*.      'I'M)*.    f>Muit    h^s   ln'crt    t  <  •  p< '  1 1  ♦  ^  |  1  y   rii.Mir    hy  .uithor*T» 

^irlrl  r        1  n^i    t  iio  fnol)lofii   n\    «i-   .f";sinM   si-vf'rrly      urJ  i  cm  pp^'H  porsun*; 

(I^rown,    N  I  o  t  iif  '  i  ^        M.im  r      N  i  o  t  ur k  i  ,    l'>^f>).       Ft     i      f^oh.ibly   5.,ifo  to 
<pMi«  •  r  .»  I  I  /. '    J      '       t  1  !  •  ■  n<  •  •   of    ,  1  M    » M    V  • .  t  .     ■ .    ,  M  >or  ^  ,  J    *i    to   u-,  mi  y   ^  i  i  f  f  <  •  t  I'n  t 

{)L!li:H!Il£Ll2[l.'i  JlL  Jlllli  Kcoloq  iccil   Model  .      F^r  i  (>  t  o  ,    Ho  r  t  h  ,   and   Swa  n  (in 
pre5;s)    point(»(i  out    that   an  ocoloqical    pt^spf^ctivo  of    human  behavior  is 
based   on   at    least    five  ar;sonipt  i  ons  about    the    interaction  between  an 
individual    and   the  environment. 

(1)      Maladaptive  or   probltnn  bbhavior  does  not  exist  solely  within 

a  person  but   in  combination  with  the  ecosystem(s)    which  the 

person   is  an   integral  part. 
Accordinq   to   this  assumption,   behavior    is  not   the  ''exclusive 
nron*_Tty  of   Iho  child"    (Rhodes^   1970,   d.    449>.     Rather,   bohavior    is  a 
result  of  an   interaction  or    interface  be'tween   the   individual   and  the 
onv  i  ronmont.  .     Conditions  may  be  present    in   the  env  i  r  o  nrricn  t  which  can 
a<^rually  elicit  disturinq  behaviors.      Tn  addition,    indivual(s^      in  the 
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settinq  must  perceive  behaviors  as   i  n  aoor  onr  i  a  te  .     Phorles   (l^'^^')  has 

^  .  J  <^  L  .  -  •      :      :  ■  -  ^ : ^     '    ^  --^ynr^^ic 

ana  rneuical  T.odels  which   locate  disturbance   within   the  individual. 
'A-.,  ilo  a  stressful   environnnent  may  contribute  to   the  proble^n  behavior, 

environment  or  event   is  significantly  stressful   in  and  of  itself 
unless   it   is   interpreter!  or   responded   to  as  such  by  the  oerson 
h  imsel f/hersel f . 

Several  different  patterns  of  faulty  interaction  betv/een  an 
individual  and   the  environment  can  be   identified.     A  relatively  rare 
pattern   is  one   in  which  a  oerson  emits   i naoor oor i a te  behaviors   in  all  ^ 
settings   (e.g.,   self-abusive  or   self-stimulating  behaviors).  More 
common! y/d isturbinq  behaviors  occur  primarily  in  only  one  setting  (for 
example,   a  child  who  stutters   in  a  classroom  or  hiqhly  verbal 
children).     A  third  ^a^tern   is   that   in  which  orohlems  occur  because 
behaviors  which  may  be  adaptive   in  one  setting   (an   institution)  are 
perceived  as  maladaotive  in  another  setting    (   the  community) . 

(2)      Ecological    interventions  must   focus  on   tlie  settinq(s)  in 

which  the  maladaotive  behavior  occurs. 
"The  obiective  is  not  merely  to  change  or   improve  the  child  but  to 
make  the  totajl   system  v/ork"    (Hobbs,   1975,   p.   114).     This  assumption 
requires  assessment  of  the  characteristics  of  the  individual,  the 
setting  characteristics,  and  the  dissonance  between  them.     Due  to  the' 
multiple  factors   ijivolved   in- such  a  task,  assessment  and  modification 
of  human  behavior   in  natural   settings  re-nains  at  a  orimative  level  of 
development   (Willems,   19f59)  . 
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(?)      ^  n  r.orcl  i  sr- 1  o]  inrx.  '  rvv.sonnpl   oarticioato   in    i  n  te  r  ve-n  t  i  on  s 

Fcoioqiral   assessment   ^        intervontion   renuire  an 
inte-rdisciplinary  approach  or,   "sonneone  who  can  move   freely  amonq  and 
communicate  v;ith  diverse  disciolines   in  the  performar       of  a  liason 
function'^    (h'obbs,   1975,  p.    120).     Teachers,   par  en  ts  ,  med  i  cal  personnel, 
and  psycholoq ists  often  have  roles   in  developing  programs  within  school 
settings.     Effective   intervention  within  community  settings,  for 

r 

example,  would  requir-^  the  participation  of  lawyers,  economists, 
employers,   and  media  experts  as  well.  * 

(4)  Ecological   inte'rvent  ions  must  simultaneously  fo<:us  on  many 
elements  of  the  system.        '  ^  ? 

Willetns   (1971  ,   1977)    noted  the   interdependence  of  ecoloqical 
networks.     Specifically,    intervention  measures  designed  to  imr-act 
chanqe  on  one  element   in   the  system  can  effect  other  elements  in  ^the 
system,  as  well.     Modifying  a  child's  behavior   in  school,   for  example, 
can  have  unintended  and  sometimes  unde#sirable  effects   in  the  home 
setting.     To  quote  Prieto,  et^  aL.    (in  press^  ,  "We  can  never  do  merely 
one  thing." 

(5)  No  two   individuals  and  no  two  settings  are  the  same.  This 
common  sense  assumption  ref lects- both  the  strengths  and  limitations  of^ 
the  ecological  model.  *  It   is  precisely  what  renders  ecoloqical 
approaches  to  assessment  and   intervention  so  appeal inq   in  theory  and  so 
difficult  to   implement   in  practice. 

Ecoloqical  Assessment .     Ecological  approaches  to  assessment  arc- 
intended  to  identify  the  problems  wi th  ■  the  interface  between  a  person 
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so  many  variables  are   at   ">lay.      Baer    (1^77)    has   stated    the  p  oblem 
well:   "Assessment  of  phenomenal   reality  is  an   infinite   task,  like 
defendinq  aqainst   "enemies".     We  can  spend  any  amount  of  our  resources 
on   it,   we  will   never   finish,   v;e  will  never  solve   the  problem,  and   if  we 
fail   eventually,   we  will  not  care  much  afterv/ard  anyway"    (p.  116). 

'Kratochwill  and  his  associates    (Petrie,   Brown,    Piersel,  Frinfrock^ 
Scholbe,'  Le     lane,       Kratochvnll,   1.9R^)    presented  an  ecological 
framework   for   school  psychologists   involved   in  the   implementation  of 
applied  behavioral   psychology  in  education  settinqs.     Based  on  Wilem's 
(1974)   discussion  of   unintended  effects   in   intervention  work^  these 
authors  presented  a  conceptual   ^ramev;ork  for  the  classification  of  some 
types  of  unintended  effects   that  may  occur   in  behaviors  that  are  not 
directly  nnaninulated   by  an   ind  iv  idua  1 .  prov  id  inn   intervention  services 
in  educational   settings.      Some  possible  types  of  unintended  effects^/)n 
behaviors  are  presented   in  Figure  3.1      Tn   this  resoect^  prediction  of 
162    (i.e.,    3x2x3x3x3)    possible  kinds  of  side  effects  are  possible. 

;  "  ^ 

As  an  option   in  evaluating  such  side  effects   (or  second-order 
consequences)    Locatis  and  Cooler    (1975)   presented  guidelines  that 
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1.  DtsjfoWe 

2.  Ntu»fal 

J.  Und05irabl« 

LTorget  S 

2.  Others 

3.  Both 

1.  tncrtasa 

2.  OecrtQM 

2.0thtr 

3.  Both 

MmmAdiatt 

2.08laytd 

3.  Very  dtlayed 

Figure  3.1    Classification  of  some  kinds  of  unintended  effects 
that  may  occur  in  behaviors  that  are  not  manipulated 
by  the  professional .    (Source:    Petrie,  P.,  Brown,  K. , 
Piersel,  W.C.,  Prinfrock,  S.R.,  Schelble,  M., 
LeBlanc,  CP.,  &  Kratochwill,  T.R.    The  school 
psychologist  as  behavioral  ecologist.    Journal  of 
School  Psychology,  1980,  18,  222-233  Reproduced 
by  permission). 
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Fourteen  q     -de  i  mes  attj  pit^sented   as   a    format   to  evaluate   second  order 
consequences   (see   Table   3.4)  . 

There  is  no  single  assessment  tool  appropriate  for  evaluating  each 
relevant  variable   in  a  given  ecosystem.     However,   there  seems  to  be 
increasing   realization  that   setting  variables  must  be  considered  in 
order   to  clarify  and  remediate  behavioral  and   learninq  Problems.  A 
brief  discussion  of  some  of  these  variables  and   recently  developed 
assessment  devices  follows. 

Assessment  of  Behavior   in  S  i  ng  1  e  .f^e  1 1  i  nqs  >     Saiith,   Tie  is  worth,   and  Greer 
(  1978)    and   Roger s-V/arr en   (1977)    outlined  strategies   for  assessing 
specific  target  behaviors  and   relevant  setting  characteristics.  These 
include  the  following: 

1.  Identify  the  target  behavior:     The  behavior  of  concern  by  name, 
topography,  and  function  for   the  subject  in  the  target  setting  is 
iden t i  f  ied . 

2.  Assess  the  .physical  setting  in  which  .the  -target, .behavior  occurs;"^ 

r 

Here  factors  such  as  instructional  space,  architectural  design^ 
furniture,  and  physical  cues  for  the  target  behavior  are 
identified  . 

3.  Assess  .instructional  arrangements:     The  task  ^ere  is  to  evaluate 
the  curriculum  content,  teaching  methods  materials,  and  media. 

4.  Assess  the  .social  situation  within, the  setting:    /reacher- teacher , 
teacher-child,  and  child^-child  interactions,  reinforcement 
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Table  3.4  available  in  Sattler,  J.M.    Assessment  of  children's 
intelligence  and  special  abilities  (2nd  ed.)    Boston:    Allyn  &  Bacon, 
1982.  'V, 
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continqencies ,  and  staff^time  anri  competence  are  relevant  hc^e. 
5.       Assess  ■  the  ^  setting  .  i  n  .  r  el  a  t-i  on  -  to  -  a  ny  ■  ex  j  s  t  i  no  .or  ..an  t  ici  oa  ted 

i,,f.  i    <  nv  )  0])i: :  nnc^s  f  :  f.>n   of  wh-t-^vr    or   not-    thr>  pliy  <;  i      1  rinrl 

saiii^n    v;i  1  1    f  a.M 1  t>:it  -   or   hirvlr-r   a   narticul<ir    i  n  tr^  r  v  en  t  i  o  n  in 

that   setting  must  be  considered, 
Ic:    their   text.   Smith,  £t  al.    (197R)    have  provided  examples  of 
asse^-srr.ent  checklists  which  may  be  used   to  assess  these  types  of 
variables   i  n  -'ed  uca  t  i  ona  1   settinqs.  '  ^ 

Moos  (1972),  working   in  psychiatric  ward  settinqs,  develoood  the 
Ward   Atmosphere   Scale   (WAS).     This  device  can^  used   to  measure  . 
sociocul tural  aspects  of  ward  environments  relative  to  posthospital 
outcome.      Included  on  the  WAS  are  measures  of  patients'    involven^ent  in 
their  program,   =jtonomy  of  patients,  order  and  organization  of   the  v;ard 
program,   and  degiee  of  staff  control. 

Assessing  Behavior  ^(?ross  ..Settings.   Of  primary  importance  here   is  to 

determine  the  effect  of  different  settinqs  on  behavioT.  Assessment 

devices  available  for  uso  in  single  behavior  settinqs  are  much  more 

common   than  assessment   techniques  designed   for  assessment  across 

settinqs   (Prieto,  et  al,    in  press). 
> 

Behavior  differences  across  behavioral  settings  fiave  been  observ^ 
and  described  by  several    resear  cher  s  .  ( e  .g  . ,   Gump,   Schoqgen,^&  Redl, 
1963;   Tars  &  Appleby,  1973).     Thomas  and  Chess   (  19 77 )  developed 

interview  schedules  and  behavior  checklists  which  allowed  them  to 

J- 

compare  the  oerceptions  of  adults  concerning  the  characteristics  of  a 
child's  behavior  across  settings.     Their  work  showed  that  behaviors 
which  v/e»re  viewed  as  problem  behaviors  .in  one  scttinq    (school)  were 
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sometimes  not  perceived  as  problems  in  another   (home).     They  corjcluded 
that  perCGOtions  of  behavior  as  beinq  aporopriate  or   inaopropr iate  were 

dopondc^nt.   on   the  ex  P''^ct.a  t  i  ons  and  value   syr>tf:^s  held  by  the  observer. 


hijhtivicM'    "nin-it    D^.:-    to    Miaiu^e    tl:e    s^;Lt;inu    Oi    lo    ui  ace    tne   n-     son    in  a 

different    setting   rathei    than   attemptinq   t^    ehanne   the  person    (n.   141)  . 
If   this   is  so.   attempts   to  assess  behavior  accross  settings  are 
important  and  merit  more  attention  and  development  than  have  been 
expended   to  date. 

Assess  ina  Communi  ty     actors  .      It  seems  loqical   to  assume  that  *t he?  e^.^;*; s 
a   strong   relationship  berween  the  values  held  by  a  community  and  *. 
types  of  services  and   programs  the  commuriity  ..rovides.     Assessinq  the 
effects  of  community  and/or  culture  on  behavior  patterns  and  sett:inqs^ 
therefore,  becomet>  relevant  within  the  ecoloqical  model.  'Here^ 
assessment  must  focus  on  in-?<"hool   and  out-o  f- school  support  services, 
clusters  of  settings,  and  delivery  syste^r^s  through  which  services  are 
made  accessible   (Smith,   e_t  £l  ,    1978)  . 

In  order   to  assess  the  role  of  the  community  in  contributing  to 
maladaptive   interactions,    it   is  necessary  to  study  which  persons  become 
labeled,  how  identification  occurs,  what  service  delivery  systems  are 
available,  how  they  affect  the  patterns  of  treatment,   and  the 
effectiveness  of  treatment  accordino  to  multiple  criteria   (Prieto,  et 
al  ,    in  press),     Lewis   (  1973)    pointed  out  "that  before  we  can  intervf?ne 
at  the  community  level,  we  must  establish  methods  to  assess 

r 

bureaucratic  regulations  and  guidelines  which  are^related  to  the 
funding  of  proarams.     In  addition,  methods  by  whvch  service  delivery 
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^.ystems  can  avoid  discord  between  the  individuol   and  his  settings  need 
to  be  exaoiined. 

Assessment  of  community  s..-rvires   is,   nerhans  ,   the  most  rlifficul.  t 

dovolooo-.i       Ther?   have  ho.>n  ,    howc-vor,    ?^,o  ,  e   no  r i.  th  \    -  f  t  c  i:  t  s   in  this 
acea.      Apter    (1977)    offececl  a  i.odel    for  commi:nity  education  based  on 
ecological   theory  which  oro,/ides  a  starting  noint  for  assessment  of  the 
effectiveness  of  a  community's  educational  system.     Some  assumptio.,s  of 
Aoter ' s 'model  are  that  learning  should  continue  throughout  life, 
facilities  should  be  used  efficiently,  comiaunity  participation  in 
educational  decision-making  should  be  facilitated,  programs  which  r.ieet 
the  unique  needs  of  children  and  adults  should  be  provided  \  personnel 
should  realize  that  education  is  not  the  sole  prooerty  of  any  one 
agency,  and  research  and  program  development  should  address  the  • 
totality  Of.  a  person's  education  (p.  368). 

nillespie-Silver   (197v))   developed  a  checklist  for  assessment  of 
local  community  services.     Included  were  industries,  ethnic  grouns, 
agencies,   func4d  programs,   nonprofit  agencies,  parent  groups,  medical, 
lega<i,  and  psychological  ser-i^ices  and  their   interactions.     A  second  ^ 
checklist  evaluates  services  provided  by  the  state  and  region. 
Information  is  also  provided  about  services  at  the  national   level.  In 
addition,'  guidelines  arepresented   f ordeveloping   integrated  service 
programs  for  children.     Checklists  are  provided  for  use  in  develooinq 
an  educational   plan  for  a  child  utilizing  community  resources  and  for 
developing  strategies  for   implementing  the  plan  which  consider  the 
resources  and  support  systems  available. 

Smith,  et  al.   (1978)   presented  an  example  of  an  inventory  which 
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can  be  useful   in  assessing  community  component-s  related  to  students, 
services,  and  professionals.     These  authors  also  provide  quidelines  for 
cornparinq  student  needs  with  existing  as  well  as  unavailable  community 


one  o 


,f   the  most  commonly  used  means  to  assess  the  -ap  ^ropr  i  ateness  of 
community  services   for   individuals  with  problem  behaviors.  The- 
legislation  passed  during   the  last   few  years   (e.g.,   PL   94-142  and 
Section  5  04  of  the  Rehabilitation  Act  of  1973)   would   tend   to  supoort 
this   revealing  observation. 

Consierations   in  the  Use  of  ,s  oc  i  ol  og  i  ca  1 -and -Ecolog  ica  1 -Model  s-.  for 
Non^iased  Assessment     The  differentiating   labels  "Sociological"  and 
"ecological"  were  used   in  describing  the  previous  two  conceotual  models 
o^  human  functioning.     Such  an  apparent  distinction  may  not  be  viable, 
however,   in  practical  aoplicatione  derived   from  these  models  because 
interventionists  take  both  environmental   and   individual  variables  into 
account,   although  to  differing  degrees. 

The  sociological  and  ecological  perspectives  on  maladaptive 
behavior  both  evolved  partly  in  reaction  to  the  restrictions  of  other 
models  of  human  behavior.     For  example,  because  traditional 
interventions   (e.g.,  Dsychothei;,apy)    were  tyoically  conducted  outside  an 
individual's  natural  environment,   two,  probl  ems  emerged  .     Any  positive 
changes  developed   in  ^herapy  were  not  necessar i 1 y  general i zed  to  other 
settings.      If  changes   in  behav ior • wer e  generalized,   they  were  not 
always  relevant  to  other  settings.     Criticisms  of  the  behavioral  model 
also  centered  on  the  general i zabi 1 i ty  of  gains  and  the  narrow  focus  of 
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intervention.     Modifyirjq  target  behaviors  might  not  be  sufficient  to 

/ 

alter  negative  pattern^  of  behavior. 

There  was       so  concern  abr-  it  the  fairness  of  viewinq   the  person  as 

,  J,,  .  i  i),     ,■,,:,■>,!(-■■■.•,  ';ololy   within    ^h._<  ru-rson 

Significant   responsibility   for   the  problem.      The  s   biological  and 
ecological   nerspectives  gained  support  during   the  l96n's  when  the 
Vietnam  War  challenged  beliefs  of  what^^was  normal,  what  was  deviant, 
which  behaviors  and  persons  were  good   and  which  evil    (Prieto,  ^1  £i ' 
(in  press).     The  arguments  generated  by  the  events  of  those  years  led 
many  to  the  conclusion  that  deviance   is  relative,  .that   is,  deviance 
depends  on  the  values  of  the  persons  making  the  judgements  ar^d  the  , 
context  within  which  behaviors  are  viewed.     This  atmosphere  undoubtedly 
led  m^ny  professionals  to  the  conclusion  that  disturbance  was  created 
by  and  assessed   in  situational  contexts,  and  that  effective  and  ethical 

treatment   required  altering   those  contexts  as  well  as  the  behavior  of 

v,  ■ 

the   individual.  / 

/ 

At   the  present  time   there  are  relatively  few  formalized  systems  of 
assessment  based  solely  on  sociological  and/or  ecological  theroy. 
Present  assessment  aoproaches   influenced  by  these  models  of  human 
functioning  are  eclectic   in  nature.     This   is  simultaneously  a  strength  , 
and  a  weakness.     The  strength  lies  icr^  the  multiple  views  brought  by  the 
expertise  of  many  disciplines.    "The  mere  categorization  of  people  of 
products  acquired,  which  all   too  often  character'^d  zes  typical 
educational  and  psychological  assessment  reports,  can  be  avoided..  The 
weakness  lies  in  the  lack  of  any  systematic  formulation  and  applicatifcn 
of   intervention  strategies  based  .on  sociological/ecological  assessment 
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data.     It   takes^   derhaps,  less  time  and  enerqy  to  focus  chanqe  efforts 
on  the   individual  that  on  the  environment   in  which  she/he  functions* 
iMore  attempts  are  n(?eded   to   inteqrate  the   insights  nrovi     '1  b, 

particular,    tnoro   is   a   need    for    {  I)    instruments    ^nd  ;iiethods  for 
assessiTient  of   relevant  variables   in  context  and;    (2)    a  technoloav  for 
assessing   the   interaction  of  the  selected  variables. 

The  outcome  of  such  efforts  could  be  a  mor^^  comnlete  and  more 
usable  description  of  behaviors   in  the  context   in  which  they  occur  that 

r 

could   then  be  translated   invtp  viable   intervention  strategies.     Such  an 
approach  to  assessment  would  seem  to  be  consistent  with  criteria  for 
non-biased  assessment. 

In   addition  to   the  potential   contribution  of   the  socioloqicnl  and 

o 

ecological  models  to  the  development  of  alternative  strategies  for 
colleting  non-binned  data  useful    for  educational  decision  makina,  these 
models  also  raise  issues  that  challenge  the  validity  of  traditional 
norm- refer enced   tests.     By  highlighting   the  environmental    impact  on  the 
way  children  learn  and  perform,   these  models  draw  our  attention  to  the 
s i tuation-SDeci f ici ty  of  many  behaviors  that  we  often  casually  treat  ^ 
as   immutable.     Given  the  cultural  differences  between  minority  and 
nonminority  cnildren,   and  consequent  potential  differences  in  learning 
and  performance  styles,  conclusions  drawn  regarding  the  non-biased 
nature  of  these  tests   from  technical   information  available  to  date  may 
be  pr^ceived  as  premature.     Continued  research  froni  within  the  scope  of 
these  models  should  yield  a  better  understanding  of  the 
generalizability  of  test  data  across  settings  and  the  potential 
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urtdesirable  by-products  of  interventions  desiqned   from  their  use, 
especially  as  both    ^pply  to  culturally  different  children. 

Finally,   research  i nvest ig at i nq   the  assessment   issues   raised  by 

t'-M-:.'-  'r](^\ ^.(x]  <-\   -.Ml]    h-'-l  n    in   fhr>  do.sjriri   of   a  1  t- r  rn  a  t  i  v  c-  rriotho^V-.  for 


^Xci.TTiu  of   Lh.:^  construct  validity  of   the   test   presently  e^iployed 

throuqh     oo  qeneration  of  converqent  evidence   (Campbell   &   Fiske,  1959) 
ii.  Pluralistic  Model 


Components     In  recent  years  a   pluralistic  model  has  been   identified  in 
the  literature   (Mercer,   197^1;   Mercer  &  Ysseldyke,   1977),  Technically, 
this  model    is  more  approprpia tel  y  a  conceptual  aoproach  that  assists  in 
orqanizinq  various  assessment  stratoqies  that  are  more  resoonsive  to  a 
culturally  pluralistic  society  than  any  single  conceptually  derived 
assessment  strategy.     Nested  within  this  conceptual   approach   is  an 
attempt  to  address  the  cultural   components  of  the  assessment  process. 
Mercer   and   vsseldyke    (1977)    outlines  some  assumptions  of  this  roach: 

9 

The   pluralistic  model   assumes  that  the  potential   for   learning  is 
similarly  distributed   in  all   racial-ethnic  and  cultural  groups. 
It  assumes  that  all   tests  assess  ».hat  the  child  has  learned  about 
a  narticular  cultural  heritage  and  that  all   tests  are  culturally 
biased.     Persons  socialized   in  a  cultural  heritage  similar  to 
those  in  the.  test's  standardization  sample  tend  to  perform  better 
on  the  test  than  those  not  reared   in  that  cultural   traditon  ' 
because  of  differences  in  their  socialization,     A  variety  of 
procedures  have  been  designed  to  estimate  the  level  of  performance 
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wh.ch  the  child  Vsould  havo  achievoci   if   the  cultural   biases   in  the 
testinq    instrunnent  and   procedures  were  controlled    (d.    83)  . 

A  number    of  different  measures   have  been  developed   that   fall   within  the 


of    v.iriov:s    i  :^  s  t  r :>    th,it    1:^11    i:>tc    t^'^-Go   .Mtcnc:ic^s   can   be    found  in 
Jenson   (1<^80)    and   Settler   (19R2)    and   are  reviewed    in  more  detail  in 
Chanter   7    in   this   volume.      For  example,    the  Rlack   Intelligence   Test  for 
Children   (niTCH)    by   Williams   (  1974  ),    the   Enchilada   Test    (Ordi'/  f»  Ball, 
1972)    which  has   31  multiple-choice   items  that  deal   with  experiences 
common   to  Mex  ica  n-^'^jner  ican  barrio  children,   and   the  tes  t- tr  a  i  n- tes  t 
strateqy  oresented  by  Budoff   (1972)    represent  some  of  the  more  common 
techniques.     Other  examples  of  so-called  culture  fair   tests   include  the 
r^eiter    International   Performance  Scale,   Cattell's  Culture-Fair 
Intelligence   Tests,   and   Raven's   Progressive   Matrices   (see   Snmuda ,  1975 
for  other  exanples)  . 

Another   set  of   proredures  within   this  model   use  multiple  normative 
frameworks   for  various  groups.      Although   these  normative  framev/orks 
can  be  based  on  local   norm-based   tests,   the  most  systematic  and 
identifiable  strategy   in  this  area   i.-   the  SOMPA  developed  by  Mercer  and 
Lewis   (1978).     The  SOMPA   is  actually  a  systerr.  of   te^ts  developed  to 

assess  children   from  cultur^illy  different  backqroi^ds.      The  SOMPA  does 

■  ,  -  \  (  ' 

not   just   represent  a  pluralistic  model,  but  rather   incorporates  aspects 

of  a  medical  model,   social   system  model,   and  what   is  being  called  here 
a  oluralistic  model    (Morcer,   1979).      Soc iocul tur al   Scales  have  been 
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•   developed  within  the  context  of  this   Pluralistic   Model.     These  scales 
have  the  folloviSg  purpose: 

The  Socioc'ul  tur al   Scales  determine  how  much  an   individual's  world 
differs   from  the  Anglo  core  culture.      Four   scales   locate  an 
individual   ina   three  dimensional    intercept  of  socioeconomic 
status,  degree  of   Anglo  cultural   ass im i lat ion ,   and  degree  of 
inteqration   in  Anglo  social   systems.     Once  an  individual's 
sociocul tur al  group  is  gauged  by  the   Soc i oc ul t ur al    Scales,  a 
normal  distribution  of  WISC-R  scores   is  predicted   for  that 
sociocul tural   group  by  means  of  a  multiple  regression  ptocedure 
(Figueroa,   1979),   p. 33). 

There  has  been  a  considerable  amount  of  material   published  on   the  SOMPA 

and  much  of  this   is  reviewed   in  Chapter  7. 

Considera  t  ions     Culture-fair  or  culture  specifj^c  tests  used  within  the 
context  of  the  pluralistic  paradigm  have  been  designed   to  meet  the 
spirit  of  being  non-biased  ot  nondiscriminatory.     Genera   ly,   such  tests 
have  been  developed   to  minimize  language,   reading  skill,   speed  and 
other   factors   that  may  be  culture  specific  and  to  minimize  cultu-ral 
differences  affecting   t   st  content  and   test  taking  behaviors    (Oakland  & 
Matusz      ,   1977).     There  Eire,   however,   several   problems  with  such 
strategies.     To  begin  with,   language  is  only  one  dimension  on  which 
various  tests  could  Ua^ d i scr im ina tor y .     Such  factors  as  social  skills^ 
'test  taking  behaviors  may  even  be,  more  important  issues.     Even  if 
language  is  a  primary  concern,  nonverbal   tests,  may  not  be  culture- fair 
because  they  depend  on  cognitive  behaviors  that  are  related  to  language 

Er|c  I2:i 
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systems   (Cohenr  1^69).     Reviews  of  this  litoi^ature  also  sugqest  that 
ethnic  in  i  nor  i  t  ies^  do  not  perform  any  bottr-r  on  so-^called  culture-fair 
tests  than  on  more  traditional  procedures   (Arvey,   1972),  l^onverbal 
tests  may  even  be  more  d  i  f  f  ic  ul  t,  't  han  verbal   tests   for  certain  groups, 
such  as  blacks    (Sattler,   1982  ). 

Second,   there  is  some  concensus   that  no  test  can  really  be 
regarded  as  culture-fair   (Anastasi,    1961;   Vernon,   1965).     Moreover,  as 
Sattler    (1974)    noted,  "...no  test   can  be  culture-fair   if  the  culture  is 
not   fair"    (p.    34) .     Tests  can  also  be  ordered  on  a  continuum  from 
highly  cu  Iture   loaded   to  iHghly  culture  reduced   (Jensen ,  19  8'?)  .     Such  ' 
tests  would  differ   in  the  dimensions  presented   in   Table   3.5*     As  Jensen 
(1980)    notes,   changing  a   test  on  any  one  or  a  combination  of  these 
dimensions  will  not  necessar i ly  make   the  various  tests  less  culturally 
biased   for  a  certain  cultural  group.      In  prediction  on  a  cr  terion, 
each  test  must  be  empirically  examined   for  bias.   However,  most  tests 
that  can  be  characterized  as  c ul t ur e- red uced  have  not  been  subjected  to 
empirical  work  equivalent  to   the  more  common  measures  used  in 
educational   settings  (e.g.,  WISC-R).     A^ter  reviewing  a  number  of 
cul ture- reduced  tests,   Jensen    (19R0)  concluded: 

*      None  of  these  attempts  to  create  highly  c ul t ur e- red uced  tests, 

when  psychometrically  sound,   has  succeeded   in  eliminating,  or  even 
appreciably  reducing,   the  mean  differences  between  certain 
subpopulations   (races  and  social  classes)    in  the  United  States 
that  have  been  noted   to  differ  markedly  on  the  more'conventional 
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Table  3.5 

Dimensions  of  Cultural   Loading  on  Various  Tests 


Culture  Loaded  Culture  Reduced 

Paper-and  pencil  tests  Performance  tests 

Printed  instructions    Oral  instructions 

Oral   instructions...  Pantomime  -instructions 

No  preliminary  practice  Preliminary  practice  items 

Reading  required  Purely  pictorial 

Pictorial  (objects)  Abstract  figural 

Written  response  ..Oral  response 

Separate  answer  sheet  Ansv/ers  written  on  test 

itself 

Language  Nonlanguage 

Speed  tests   Power  tests 

Verbal  content  Nonverbal  content 

Specific  factual  knowledge  Abstract  reasoning 

Scholastic  Skills  Nonscholastic  skills 

Recall  of  past-learned   information  ..Solving  novel  problems 

Content  graded   from  familiar  to  rare  All   item  content  highly 

f amil iar 

Difficulty  based  on  rarity  of  content...,  Difficulty  based  on 

complexity  of  relation  education 

Source:  Adapted  from  Jensen,  A.R.  Bias  in  .Mental  testing.  New  York:  The 
Free  Press,   1980,  p.   637.   Reproduced  by  Permission. 
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cul  tur  t»  1  -  1  oadecl   tf»sts.     On    the  other   hand/   sonne  cul  turo- reduced 

tests  show  negligible  differences  between  certain   widely  diverge 

linguistic,  national   and  cultural  qro^jps,  which  suggests  that 

these  tests  are   indeed  capable  of  measuring  general   ability  across 

quite  wide  cultural  distances.     The   fact   that  such  cul ture^ reduced 

tests  do  not   show  smaller  mean  differences  between  blacks  and 

f 

whites   (in   the  United   States)    than  do  convent iona 1^ cul ture  loaded 
10  tests  suggest   that   the  racial  difference  in   test  scores   is  not 
due   to  cultural    factors  Per   se    (p.    713)  • 
B'inally,   within   the  SOMPA  there   is  still   little  evidence  that   it's  use 
will    lead   to  educational  decisions   that  are  not   racially  or  culturally 
discriminatory    (Oakland,   1979).     Various  criticisms  of   the  S'OMPA  have 
been  presented   in   the  197^  School  .Psychology  . Digest    (Reschly,   1979)  and 
reviewed  by  Sattler   (1982).      Because  we  discuss   this  assessment 
procedure   in  more  detail    in  Chapter  7   ,    issues  are  not  discussed  here. 
However,   it   should  be  emphasized   that   there  is   little  empirical  data  to 
support   its  use. 
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Sumrpary-apd-Copclusioos 

In  this  chapter  we  provided  an  overview  of  conceptual  models  of 
human  functioning  and  their   implications  for  assessment  bias. 
Specifically,   in  the  chapter  we  reviewed  the  medical  model, 

ntrapsychic  disease  model,  psychoeducational  process  or  test-based 
model,  behavioral  model,  sociological  deviance  model,  ecological  model, 
and  pluralistic  model.     Each  of  these  models  was  discussed  within  the 
context  of  its  components,  assumptions,  and  features  that  make  it 
unique  and   identify  it  as  a  s-parate  conceptual  framework  for  work-in 
the  assessment  field,     In  addition  to  this,  each  model  was  critiqued 
within  the  context  of  methodological  and  conceptual  issues. 

Several  major  issues  need  to  be  taken  into  account  when 
considering  conceptual  models  of  human  functioning  and  their 
implications  for  assessment  bias.     First  of  all,  each  model  provides  ■ 
somewhat  differenct  sets  of  data  to  be  identified  in  the  assessment 
process.     This  is  important  within  the  context  of  what  aspects  of  data 
might  be  ignored  or  deentphasi  zed  in  the  assessment  process.  For 
example,   in  many  models  assessment  occurs  prior  to  actual  intervention 
.services  and  therefore  does  not  always  address  specifically  the  kinds 
of  outcomes  produced  once  services  are  identified.     Second,  a  major 
problem  across  all  conceptual  models  relates  to  the  lack  of  research 
base  for  many  of  the  theoretical  or  philosophical  features  identified. 
Thisjy  a  major  problem  inasmuch  as  adherence  to  a"  certain  model  might 
,J>/hased  more  on  subjective  or  philosophical  bias  than  on  empirical 
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analysis.     Finally,  we  believe  there  is  some  benefit   in  the  future  to 
considering  a  broader  conceptual  base  for  assessment ^   taking  into 
account  each  of  the  different  models.     Spec i f ical 1 y ,  each  of  the 
different  models  has  certain  features  to  assist   in  the  assessment 
process  that  another  one  may  not.     Thus,   individuals  assessing  children 
should  consider  that  various  models  take  into  account  a  broader  range 
of  concept ual ' and  methodological   features  to  further  reduce  assessment 
bias. 
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Chapter  4 
Technical  Test  Bias 

Many  individuals  have  assumed  that  differences   in  the  mean 
performance  of  various  groups  on  tests  of  cognitive  functioning, 
especially  tests  of  intelligence,  automatically  connotes  bias  (e.g., 
Alley  &   Foster,    3978;   Chinn,   1979;   Mercer,   1976;   Williams,   1974).  This 
concept  of  bias  assumes  that  all  human  populations  are  essentially 
equal  with  respect  to  their  cognitive  functioning  and  that  tests,   to  be 
nonbiased,   should  reflect  this  similarity.     As  concluded  by  Alley  and 
Foster   (1978),   for  example,  ".    .    .a   test  should  result  in 
distributions  that  are  statistically  equivalent  across  the  groups 
tested   in  order  for  it  to  be  considered  nondiscriminatory  for  those 
g  roups"    ( p .   2 )  . 

This  concept  of   test  bias  has  been  challenged  by  many  (e'.g., 
Jensen,   1980;   Reynolds,   1982).     Reynolds  (  1982)   argu'^s  that  such  a 
position  "conveys  an  inadequate  understanding  of  the  psychometric 
.   construct   (of  validity)    and  issues  of  bias"   (p.  187,  parentheses 

added).     Jensen   (1980),  referring  to  this  concept  of  test  bias  as  the 
egalitarian  fallacy,  calls  this  position  scientifically  unwarranted. 
When  such  a  position  is  adopted,  one  removes  from  the^  realm  of  science 
all  chance  of  empirically  determining  whether  group  differences 
actually  exist  or  are  a  function  of  test  bias.     Group  differences  could 
never  be  studied  since  any  differences  found  would  be  by  definition, 
the  result  of  biased  measures.     Reynolds  and  Gutkin   (1980)   point  out 
that  ethnic  group  differences  in  mental  test  scores  have  been  a 
constant  and  well  documented  psychological  phenomenon.     Those  holding- 

er|c  rpti 


1?6 


the  above  concept  of  bias  would  deny     he  existence  of  this  phenomenon 
ond  by  necessity  conclude    that    these   reported  differences  are  a 
function  of  biased  measures. 

In   opposition  to  those  who  hold  as  biased  all  tests  on  which 
pertormanct^   is  associated  with  qroup  membr    ;hipr   are  those  who  arque 
that  these  differences  must  be  examined  enpirically  to  determine  if 
findings  are  a  function  of  bias  or   real  group  differences   in  the 
measured  construct.     It  should  be  noted  that  the  empirical  study  of  the 
validity  of  tests  to  determine  if  group  mean  differences  are  real  or  a 
function  of  bias  in  no  way  implies  racial   bias  on  the  part  of  the 
researcher.     To  the  contrary,  by  studying  differences  among  groups, 
bias  is  avoided  by  examining  any  a  priori  assumptions  regarding 
possible  measured  differences.     Whether  one  holds  a  priori  belief  that 
the  differences  are  real  or  not  and  the   implications  one  can  draw  from 
the   findings,  are  a  function  of  the  theory  one  adopts.     This   is  the 
nature  of  science.     Researchers   in  this  area  are  a  diverse  group  of 
people,   some  out  to  validate  testing   in   its  present   form,  others  out  to 
reform  current  practice   (Cole,  1981).     Regardless  of  their  motives, 
most  are  willing  to  accept  science  as  the  arbiter  of  their 
d  isagreemen ts . 

Much  of  the  research  that  has  been  generated   in  the  study  of  test 
bias  has  employed  validation  theory  to  help  provide  meaning  to  measured 
group  mean  differences.     Validity  is  an  estimate  of  the  degree  of 
accuracy  with  which  a  test  measures  what  it  proports  to  measure.  The 
more  valid  a  test   is,  then,   the  more  accurately  we  can  determine  if 
real  group  differences  exist.     Given  such  direction  for  the  stijdy  of 
bias.  Alley  and  Fosters'    (1978)   conclusion  that  all  groups  should  have 
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equivali^nt  distributions  roqardlG?sr>  of  the  test's  validity  is  a  non 
sequ:tur.      It    is  valid   to:.ts   that   will    tell   us   if  all   groups  have 
equivalent  distributions, 
Tbe  .C   icept^Qf  ^Ual idi ty 

Vftiilo  thGro  aro  many  voo  have  adopted  the  study  of  validity  to 
provide  structure  to  their  empirical   search   for  bias   in  tests,  efforts 
have  gone   in  different  directions  as  a  consequence  of  how  the  concept 
of  validity  has  been  defined  and  which  aspects  of  validity  are 
emphasi::ed.     Traditional   opera  t  ^  ona  1  i  za  ti  ons  of  the  concept  of  validity 
ha  e  resulted    in  validation  being  segmented   into  three  types:  Namely 
(1)    con»..nt  validity;    (2)    construct  validity;  and    (3)   cr  i  ter  ion- rel  ated 
validity 

Content  validity  is   that  type  of  validity  that  provides 
information  on  how  well   test   iterps  sample  the  content  of  the  domain  of 
be^javiors  that  are  expressions  of  the  contruct  measured.  How 
accurate^-/   scores  on  a  test  represent  the  construct  it  purports  to 
rpfeasure  is  an   issue  of  construct  validity.     The  third  type  of  validity, 
cviter ion-related  validity,   is  established  for  the  purpose  of 
iC  ^   cifying   the  accuracy  in  predicting  performance   in  a  criterion  to 
whi^•h  the  construct  purports  to  be  related. 

There  are  those  who  suggest  that  this  conceptualization  of 
validity  is  problematic   (e.g.,   Cronbach ,   1980;   Messick,   1975).     One  of 
the  potential  dangers  of  using  this  tripartite  definition  is  the  risk 
of   fragmenting  the  larger  notion  of  validity  to  the  extent  that  we  lose 
sight  of  the  more  comprehensive  picture  (Reynolds,   in  press) .  The 
notion  of  validity  cannot  be  embodied  in  any  single  type  of  validity  r 
regardless  of  wlfat  the  use  for  the  test  is  '^urport.ed  to  be.     If  so 
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embodied,    th*^  potential   by-product  m<^y  be   the  ex(^lu!;ive,   or  near 
exclusive,   depends  »nc^(^  of   any  one   typ<*   to  (eUabli.sh   the  validity  of  a 
tost. 

In   response   to   this  problem,  both  Messick   (1975)    and  Cronbach 
(19ftC^)    have  pncouraqed   that  we  brinq   toqether  all    types  of  validity  and 
recoqnize   them  as  aspects  of  one  validity,  construct  validity.  These 
authors  suggest  that  the  validity  of  a   test  can  only  be  established 
when  one  studies  variou     types  of   information  relevant  to  the  accuracy 
of  the  test  score.      Such  a  conceptualization,  encourages  diversity  ^n 
the  way  we   think  about,   and  consequently  study,  validity.     F'or  example, 
Messick   (1980)    identifies   17  different   types  of  validity  that  may  be 
valuable  in  stadyinq   the  accuracy  of   tests.     Within  such  a 
conceptualization  bins   in  a   test   is  determined  by  an  examination  of  a 
variety  of  evidence  all   bearing  on  the  construct  validity  of   the  test- 
Evidence  of  bias   in  any  one  area  would  classify  the  test  as  biased,  at 
least  on  this  dimension. 

The  study  of  bias   through  traditional  methods,   that   is,  by 
employing   those  methods  common  to  the  study  of  content,  construct,  and 
criterion- related  validity,  can  be  referred   to  as   the  study  of 
technical    test  bias.      A  large  body  of  literature  has  emerged   in  recent 
years  that  has  studied  bias   in  a   technical   sense.     Researchers   in  this 
area  differ   in  how  they  classify  the  different  efforts.  Consistent 
with  the  three  types  of  validity  traditionally  defining   the^  concept, 
some   identify  bias   in  this  area  as  either  content  bias,   construct  bias, 
or  criterion-related/predictive  bias   (Reynolds,  1982).     Others  choose 
to  classify  the  three  types  of  validity  within  two  classes  of  bias 
(Cole,  1981;   Jensen,  1980).     The  first  includes  those  studies  of  bias 
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r<»l.it('   to   the   usu  of    [Am*   Irsl  r(*l<it<'d   to  c^ritcria  c^xt:€?rnal 

to    t  h<'    tfst),      '111  i  ?;   i»n('c-)!np.\.sr;i'<:  r'ritf*ri<i   cmploytMi    in    t  hc>   5:tu(ly  of 

pinJictivr  validity.      The   sfc-otKl   ontails   t  h(»   study  of   bias    fhat  is 
iiiMTnal    to  the  structure  of   tii*    test*     Criteria   used    in   the  study  of 
both  cont(^nt   and  construct   validity  employed   refers   to  construct  bias 
as    it    If;   presently  hoinq  calli^i. 

^f)r   tlie  purposes  of  this  report,   the  latter   two-class   scheme  will 
be  employed.      Since  the   use  of   the  criteria   in  both  classes  are 
employed    for   the  purpose  of  helping   to  verify   if  a   test  has  construct 
validity   wo  have  chosen   to  call   what  has  more  commonly  been  referred  to 
as   '"pr  od  i  r  t  1        test   l>ias*\   external    construct   bias.      For   the  same 
reason,   we   refer   to  the   literature   that  ex.miines   the   internal  structure 
of  a  test  as   internal  construct  bias. 


When  using  predictive  validity  of  criteria   in   the  study  of  bias, 
the  question  of  external  construct  bias   relates   to  how  useful   the  test 
is   in   its  prediction  to  some  criterion  for   individuals  with  differing 
group  membership.     Thus,  one   is  not   interested   in  whether  or  not  groups 
have  the  same  mean  score,  but   if   the  test  predicts  the  criterion 
similarly  for   all    individuals,    regardless  of  group  membership.  Models 
used  to  study  prediction  bias  are  statistical  models  mojt 
comprehensively  based  on  the  linear   regression  of   the  criterion 
variable  on  the  test  score.     Three  major   features  of  the  regression 
system,   slopes,    i-wtercepts  and  errors  of  estimates  are  often  studied. 


One  of  the  most  comprenens ive  definitions  for  this  type  of  bias  is 
offered  by  Jensen  (1980)   who  writes, 


Ex ternal ^Construct -Qias 


A  test  with  perfect  reliability  is  a  biased 
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[)r(Klictor    if   there    is  a   statistically  sinnil- 
icant  difference  between  the  major  and  minor 
qroups   in  the  slopo  by  x ,  or   in   the  intercepts 
r,   or    in   the  standard   error  of   estimates  SF;^ 
of  the  regression   lines  of   the  two  qroups. 
Conversely,   an  unbiased   test   with  perfect 
reliability  is  one  for  which  the  major  and 
minor  qroups  do  not  differ  significantly  in 
b ,   Y ,   X  ,   k  ,   or   y^K^      ( p .    3  7  9), 
A  circumstance  of  no  external  construct  bias,    therefore,   exist>>  when 
the  regression  equations  for  all  grouDs  are  equivalent.     Thus,  any 
prediction  to  a  criterion  from  a   test  score  would  be  as  accurate  for 
^all  members  of  all  groups  regardless  of  the  score  they  receive  on  the 
measure  of  the  predictor  variable.     This  con  ation,  referred  to  as 
homogeneity  of  regression  across  groups,   simulatenous  regression,  or 
fairness    in  prediction    (Reynods,   1982)    is  depicted   in   Figure  4^:^. 

Note  that   in  this  condition,   t wo .  ind iv idual s   from  differing  groups 
scoring  similarly  on  the  test  would   receive  similar  predictions  (^j^^ 
^2'  regardless  of  whether  or  not  the  pair  scored  at  x^,  x^,  or 

Slope  -Bias 

The  slope  of  a  regression  line   (i.e.,   the  regression  j-oefficient 

'  in  the   ^egression  equation)    is  the  rate  of  change  in  the  cr.iter  ion 

variaole        a  consequence  of  a  change  in  the  predictor  variable.  Slope 
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bi<js  occurs  when  the  roqression  coefficients  are  different   for  the 
different   qronps  unde r    i nves t iq at i on  ;    in  other   words,   when   the  slopes 
ol    thf  r  i^i  rcr,5,  i  or)   lin<'?;  diffi^r.      Fiqure  4.?  qr  .iph  ic^a  I  1  y  depicts,  an 
ex  ample  of    slope  b  las  •  ^ 

AS  can  be  seen   in  the   figure,  two  different  regression  lines  are 
evident   for   the  different  groups.      If   the  regression  line   for  qroup  A 
were  used   to  predict  performance  on  the  criterion  variable  for 
individurils   in   both  groups   A  and  U,   systematic  error   (i.e.,  bias)  would 
occur.     For  example,    if   cwo   individuals,  one  from  group  A  and  one  from 
qroup  B,  were  to  obtain  a   score  of  Xj^  on  the  predictor,  as  can  be  seen 
in  the  figure,    if  the  regression  line  for  group  h  were  usv.d,  a 
prediction  of  a  score  of         would  not  contain  systematic  error   for  the 
member   in  group  A  and  would  be  an  over pr ed i ct ion   for  the  member  in 
group  B.     The  more  accurate  prediction  for  the  group  B  member  (i.e., 
the  one  without  systematic  error)    would  be  Y^.      If  the  same  pair  of 
individuals  vfe^e  to  score  either  2I2        -^3  predictor,  the 

prediction  of  their   scoring         or  Yj,   respectively,  on  the  criterion 
would  be  similarly  biased  for  the  member  of  group  B  and  unbiased  for 
the  group  A  members  if  the  group  A  regression  line  were  used  to 
predict.     The  more  accurate  prediction  for  members   in  qroup  B  who  score 
X     or  X  ^ would  be  Y     or  Y   ,   respectively.     Note  that   in  the  example 
depicted,   the  accuracy  of  the  prediction  decreases  as  a  function  of 
increased^^scoj^  / 
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If  a  single  regression   line  made  up  of  a  combination  o£  the 
regression  lines  were  employed  to  predict  the  criterion,  the  prediction 
for  all  members  of  both  groups  vs^auld  be  biased.     This  would  be  true 
regardless  of  the  number  of  individuals  in  each  group.     Such  i^  the 

case   if   slope  bias   is  evidenced  on  tests  normed  on  a  sample  of!^  ^ 

) 

individuals  from  groups  A  and  B  proportionally  selected  to  represent 
the  makeupr   in  number,  of  the  total  population.  ^ 
Iotercept~-Eias 

Simply  stated,  the  intercept   ^     that  point  at  which-the  regression 
line  crosses   the  Y  axis.     It  is  the  constant   in  a  regression  equation 
and   is  represented  by  Jc  differs  for  different  groups.     This  situation 
is  depicted  in  Figure  4.3.     As  can  be  seen,   when  such  are  the 
ci rcu nstances  (and  there  are  no  differences   in  slope) ,   the  regression 
lines  for  the  two  groups  are  parallel.     If  the  regression  line  of  one 
group  is  used  to  predict  the  performance  of  members  from  the  other 
group,  a  constant  under-  or  over pred ict ion  will  occur.     This  systematic 
error,  by  definition,   is  test  bias.     For  example,   if  th6  group  k 
regressi   n  lines  were  used  to  predict  the  performance  of  our  two 
individuals  from  the  last  example  (i.e.,  one  from  group  A  and  one  from 
group  B)   and   if  the  pair  scored  either  x^f  £2        —3^^"  predictor^  a 

prediction  of  Y       Y^r  or  Y^,  respectively,  would  be  made.  These 
predictions  would  contain  no  systematic  error  and  be  more  accurate  for 
the  group  A  member  then  for  the  group  B  member  for  whom  it  would  be 
biased.     Predictions  made  for  the  group  B  member  using  the  group  B 
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Figure  4.3  -  An  example  of  intercept  bias 


133 


Assessment  Bias 


regression  line  would  be  Y'^r    Y^^f  or   Y'^,   respectively.     If  a  common 
regression  line  made  up  of  scores  from  group  A  and  group  B  were  used  to 
predict,   then  all  predictions  made  for  members  of  both  groups  would 
conta  in  systematic  error • 
Bias  ^  i  p  . the ..S tandard -.Error -of -^Estimate 

The  third  feature  of  regression  that  is  used  as  an  indicator  of 
external  construct  bias  is  the  standard  error  of  estimate  SEy" •  The 
SEy     is  an  index  of  the  amount  of  error  there  is  in  the  prediction. 
Thus,  for  example/ if  rne  plots  the  scores  that  are  observed  on  the 
criterion  for  a  group  of  individuals  all  of  whom  scored  x  on  the 
predictor  and   in  accordance  wi th  the  regression  line  were  predicted  to 
have  scored        a  normal  distribution  of  scores  around  the  predicted 
score  Y  would  be  the  result.     The  standard  deviation  of  that 
distribution  is  the  SE^  .   Th^SEy"  therefore,  helps  determine  the  range 
of  ^tential  scores  within  which  one  can  predict  with  certain  degrees 
of  confidence.     I^f  the  SE^     is  different  for  different  groups,   the  test 
is  considered  biased.     In   Figure  4.4  the  distribution  of  estimates  for 
two  groups  with  the  same  regression  lines  but  different  SE^  's  is 
depicted  » 

It  the  observed  scores  on  Y  for  all   the  members  of  group  A  were 
plotted,  a  distribution  of  errors  in  estimation  would  result  that  would 
be  different  than  the  distribution  of  errors  in  estimation  plotted  for 
group  B.     Therefore  using  the  SE^     for  group  A  to  estimate  the, scores 
of  members  of  group  B  would  result  in  a  reduced  range  of  estimates  and 
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would  be  biased.     A  SE      derived  from  a  combination  of  scores  from  both 
groups  A  and  B  would  be  biased  for  both  groups  if,   in  fact,  the  SEy 
are  different  for  each. 

Note  that  in  the  definition  of  external  bias,  systematic  error  in 
the  predictions  for  one  or  more  groups  on  any  one  of  the  three  faccors 
(i.e.  slope,   intercept,  or  SE     )   of  regression  connotes  bias.  Of 

y  ^ 

course,  bias  would  also  occur  if  systematic  error  in  prediction 

resulted  from  group  differences  on  any  combination  of  the  three' 

features.     As  an  example.  Figure  4.5  depicts  different  regression  lines 

for  groups  ^  and  B  that  differ  in  both  slope  and   intercept.     In  this 

circumstance  if  two  individuals,  one  from  group  h  and 

one  from  group  B,  were  to  obtain  a  score  of        on  tUe  predictor,  a 

prediction  of  Yj^  on  the  criterion  would  contain  no  systematic  error  for 

members  of  group  A  and  would  be  an  underestimation  for  members  if  group 

B  if  the  group  A  regression  line  were  used  to  predict  the  scores.  The 

prediction  for  the  group  B  member  without  systematic  error  would  be 

Y'j^.     If  two  individuals  were  to  score        on  the  pre|dtctor  than 

regardless  of  what  equation  one  predicted  from,  the  predict ian  ^2  would 

be^  as  accurate.     If  both  scored  xi  and  the  regression  line  for  group  A 

were  used  to  predict  ^2,  the  prediction  would  contain  no  systematic 

error  for  the  member  of  group  A  but  would  be  an  overpred ict ion  for  the 
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group  B  member.  The  prediction  containing  no  systematic  error  for  the 
member  of  group  B  urj^der  such  conditions  would  be  Y'i*  As  can  be  seen^ 
using  either  regression  equation  or  a  combination  of  both,  would  result 
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Figure  4.5  -An  example  of  slope  and  intercept  bias. 
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in  systematic  error  and  consequently  would  be  biased. 

The  various  ways  one  treats  a  test   found   to  be  biased  are  many. 

that  has  validity,  even  though  it  contains  systematic  error,  may  not  be 
the  best  alternative,  especially  if  you  are  left  with  using  subjective 
data  to  help  in  decision  making.     Given  the  five  elemental  statistics 
that  can  vary  between  subgroups  (i.e.,   the  validity  coefficient,  the 
standard  deviations  of  both  the  predictor  and  cr i ter i on  var i abl e ,  and 
the  reliability  coefficients  of  the  predictor  and  criterion  variables), 
Jensen   (19  80)   argues  that  each  may  be  examined  to  determine  where  bias 
lies.     If  not  serious,  statistical  adjustments  might  be  made.  Other 
al ter natives  wo ul d   include  renorming,  or  using  different  tests.  These 
issues,  however,   are  part  of  the  questions  regarding  fair  or  unbiased 
use  in  the  decision-making  process  and  are  dealt  with  in  Chapter  6. 
Unrel iable,Tes  ts 

One  of  the  unique  characteristics  of  Jensen ' s' de f i n i t ion  is  its 
reference  to  tests  with  perfect  reliability.     Linn  and  Werts  (1971) 
cogently  point  out  that  tests  without  perfect  reliability  may  predict 
equally  well   for  various  groups  but  would  not  predict  equally  well  (and 
thus  be  biased)    if  their  accuracy  was  increased  by  increasing  their 
reliability.     Therefore,  some  conclude  that  before  external  construct 
bias  can  be  established,  corrections  need  to  be  made  in   the  test  scores 
to  account  for   the  unreliability  of  both  the  predictor  and  criterion 
measures  (e.g.   Hunter  &  Schmidt,  1976;   Jensen,  19.80). 

Such  a  conclusion  is   important  for  both  theoretical  and 
statistical   reasons.     From  a   theoretical  point  of  view,   the  whole 
notion  of  predictive  validity  is  important  in  that  it  provides  evidence 
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regarding  how  well  the  test  measures  what  it  purports  to  measure.  As 

pointed  out  previrugly,   all     types  of  validity  can  be  viewed  as  aspects 


simply  because  it  does  not  have  the  reliability  necessary  to  show  it  to 
be  a  biased  measure  of  the  construct,   would  seem  to  violate  the  major 
reason  why  one  would  want  to  measure  its  relation  to  a  criterion 
variable   in  the  first  place   (Jensen,   1980).     This  argument:  holds 
despite  the  fact  that  the  testes  utility  in  predicting  to  the  criterion 
variable  would  remain  equally  practical  across  groups. 

From  a  si  itistical  point  of  view,  each  of  the  parameters  of 
interest   in  the  study  of  external  construct  bias  is  sensitive  to  test 
reliability.     Consequently,  just  as  a  biased  test  may  appear  to  be 
unbiased  due  to  error   in  measurement,   so  too  may  an  unbiased  test 
appear   to  be  biased.     "Whatever  statistical  d i sc r im inabi 1 i ty  a  test 
has,   it   is  only  accentuated  by  improving   it*s  reliability"  (Jensen, 
1980,   p.   385).     Jensen   (1980)   describes  the  potential  effects  on  the 
interpretation  of   slope,    intercept,   and  SE^     bias  when  either  the 
predictor  or  criterion  variable  has  less  than  perfect  validity. 

When  the  reliability  of  the  measure  of  the  predictor  variable  is  - 
less  than  perfect,  bias  will  occur   in  circumstances  where  the  means  of 
the  two  groups  differ.     This  will  be  the  case  even  if  the  less  than 
perfect  reliabilities  are  equal   for  the  two  groups.     This  latter  case 
will   evidence  itself   in   intercept  bias.     When  the  reliability  of  the 
criterion  measure  is  less  than  perfect,  the  SEy     increases.     If  the 
reliabilities  across  groups  differ  appreciably  on  the  criterion 
measure,   the  outcome  will  be  bias. 

As  pointed  out  above,   it   is  recommended  that  one  should  first 
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correct  for  a'ttenuation  before  concluding  that  external  construct  bias 
ex  ists  . 

R o s c . a      -    n  ■: X  I  i_ ,.,  ij a  I ......  o u ^  l  r  c  t  , i £ 

E^mpirical   literature  in  the  area  of  external  construct  bias  has 
been  accumulating  rapidly   in  recent  years.     While  initial  efforts 
focused  mai'ily  in  the  areas  of  employment  selection  and  college 
admissions,   several  studies  have  recently  appeared  in  the  literature 
relevant  to  the  prediction  of  school  performance. 

Studies  in  the  area  of  external  construct  bias  are  potentially 
fraught  with  problems.     Included  among  the  more  serious  are:     1)  the 
unreliabilities  of  the  predictor  and  criterion  measures ,  2)  differing 
selection  criteria  for  members  from  the  various  groups  under 
investigation,   3)    inadequate  floor  or  ceilings  of  tests  used  for  one  or 
all  groups  studied,   4)    inapproriate  statistical  analysis,  and  5) 
criteria  that  may  reflect  differential  performance  due  to  experiential 
factors  (e.g.,  coaching  or  special  training).     Complications  of  these 
sorts  need  to  be  closely  kept  in  mind  when  evaluating  research  findings 
in  this  area. 

Two  methods  of  analysis  of  group  difference  have  most  commonly 
been  used   in  this  literature  to  lend  evidence  regarding  the  potential 
bias  of  a  tei:;t.     The  first  method  compares  predicitve  val^  ^ty 
coefficients  for  different  groups  while  the  second  examines  possit^le 
differences  in  the  regression  equations  derived  for  the  various  groups. 
With  respect  to  the  former,  only  partial   information  is  made  available 
in  answering  questions  of  external  construct  bias  when  bias  is  viewed 
according  to  the  comprehensive  definition  recommended  herein.     While  it 
is  true  that  if  validity  coefficients  are  different  across  groups  then 
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regression  systems  must  differ,   it  is  also  true  that  if  the  validity 
coefficients  are  the  same,    it  doesn't  rule  out  differences  in 

clu'.  s   mrL   I  die  ool        toinal    c:o  n  s  l uo  l.   b  i  .as  . 

When  validity  coefficients  are  used,   the  way  they  could  be  of  most 
value   is  whpn  a  comparison  is  made  between  the  coefficents  to  determine 
the  significance  of  any  difference  between  them.     Those  investigations 
that  examine  separately  the  validity  coefficients   for  each  group  to 
determine   if  they  significantly  differ   from  zero  are  often  erroneous 
(Humphreys,   1973).      In  addition  to  the  usual  problem  of  differences  in 
sample  size  often  evidenced   in  these  studies,  such  a  procedure  fails  to 
provide  empirical  evidence  regarding  the  key  question  as  to  whether  or 
not  the  validities  among  the  groups  differ  from  each  other. 
Investigations  of  this  sort  have  come  to  be  known  as  single-group 
validity  for  one  group  and  not  others. 

The  second  method  of  analysis  commonly  employed   in  this  area,  as 
mentioned  above,   involves  an  analysis  of  the  regression  equations  of 
the  various  groups  under   investigation.     Such  an  analysis,   to  encompass 
the  comprehensiveness  of  Jensen's  definition  of  external  construct 

bias,   would  have  to  examine  the  slope,   intercept  and  SE    of  the  

regression  systems .     Researchers  that  analyze  regressions  across 

groups,  as  are  those  that  compare  validity  coefficients,  are  in  search 

of  differential  validity.     Differential  validity  refers  to  a  test  th^^t 

has  some,   yet  differing  validity  for  all  groups. 

Eroployroept  .Testing.     A  substantial  amount  of  research  has  been 

cond  uc ted  in  the  area  of  emplo  ymen  t  tes t i  ng .     One  of  the  first  major 

reviews  of  this  literature  was  published  by  Boehm   (1972).     This  review 
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examined  13  studies,  all   involving  comparisons  between  samples  of 
blacks  and  whites  and  all   in  olved  tne  study  of  validity  coefficients, 
rh    r  f<    tr^  r  j    t  i  ('  nf  mcrt  '^'^   -^-tudir-?^    f- o    T'':  i  )  ow  ,    t'h^^   r-)cr' uno  t  i  o  n  5~  for 

•.;h-.on    the    lO't)  v]  •    '-Jc/Vc^d    i:i!icu-'i    'M   o\.vi^i:  .u.y   ^"ron?  qerK'^-^ii 

maintenance  worker  aam  i  n  i  s  t  r  a  t  i  ve  personnel.     Of   the  160 

comparisons  of  vilidity  coefficiento  that  were  made  from  a  total  of  57 
predictor  tescs  and  3B  criterion  measures^   4%  reported  differential 
validity,  a  less  than  chance  occurrence   (p  <   .05)*     Of  the  38  criterion 
measures^  however r  most  were  subjective  ratings  of  job  performance*  To 
examine   the  possibility  that  there  was  a  difference  between  the  results 
of  those  studies  employing  subjective  criterion  versus  those  that 
employed  more  objective  tests,   Schmidt,   Bisner,   and   Hunter  (1973) 
PX'imined  12  of  the  13  studies  included  in  the  Boehm    (1972)    review  plus 
sevt^.  tional  ones.     Schmidt  et  al  .   (  1973)    found  no  difference  in 

the  outcomes  of  the  studies  when  examined  according  to  the  subjectivity 
involved   in  the  criterion  measures. 

In   the  Boehm    (1972)    review,   a  significant  number  of  studies 
evidenced  single-group  validity  with  tests  demonstrating  validity  for 
whites  and   invalidity  for  blacks.     However ,  since  the  minority  sample 
sizes  were  usually  smaller  than  the  white  sample  sizes,  these  early 
findings  were  suspect.     Since  then,  four   studies  (Boehm   ,1977;  Katzell 
St  Dyer,    1977;  O'Connor,   Wexley  &   Alexander,   1975;   Schmidt,   Berner  & 
Hunter,   1973),  correcting  for  this  errbr  have  demonstrated  no  evidence 
of  single-group  validity   (Schmidt  &   Hunter,  1981), 

Similarly,   some  of  the  earlier  studies  that  found  d i-f f eren t ia  1 
validity  in  higher  than  chance  numbers  (e.g.,  Boehn,   1977;   Katzell  & 
Dyer,  1977)   have  been  shown   to  be  methodologically  flawed   (Hunter  & 
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Schmidt,  1978).     Differences  in  validity  coefficients  have  been 
demonstrated   to  be  a   function  of  Type   I  bias  resulting  from  the  data 

pr  ^-M  •         f  !  f !     •••,f^  i  nu' i::-  'l  1 PT;  1 '1  ■     h    H'M^l-r.     ]     P  1  )  Avr)ir1l!}q  this 

di  11  3  .:  uL  I         (^cu:n'ai.,    Hobkc.,    Mos'et     uiu    fionnaf,,    .    ^r/lM    an.  =vzeci  il^t^ 
pairs  of       lidity  coefficients   for  qroups  of  blacks  and  whites  and 
Hunter,    Schmidt,   and   Hunter   (1979)    examined  712  pa^rs  of  validity 
coefficients  for  similar  group,  and  both  found  a  less  than  chance 
occurrence  of  significant  differences  in  the  comparisons. 

In  a  review  of  the  homogeneity  of  regression  between  racial  groups 
in  studies  done   in  the  employment  area,   Ruch   (1972)   examined  20  studies 
that  allowed  for  the  completion  of  such  a  reanalysis  of  the  data.  The 
results  of  the  reanalysis  had  prompted  the  author  to  conclude  that 
differential  validity  occurred  at  only  a  chance  level  of  frequency. 
Citing   flaws  in  the  analysis,    Jensen   (1980)    reanalyzed   the  data 
reported  by  Ruch  (1972)   and  concluded  that  there  were  no  evidence  of 
slope  or  SEy     bias.     However,   his  reanalysis  identified  a  highly 
significant  trend  of   intercept  bias.     The  results  consistently 
suggested  higher  white  than  black Sn tercepts  with  overpred ict ions  for 
blacks  occurring  when  either  a  white  or  common  white  and  black 
regression  equation  was  used  to  predict  the  criterion. 

While  the  studies  reported  above  in  the  area  of  employment  testing 
have  all   focused  on  black  versus  white  in  their  comparisons  of  validity 
coefficients  and  regression  systems,  one  study  comparing  whites  and 
Hispanics  has  recently  been  reported   in  the  literature.     This  study 
conducted  by  Schmidt,   Pearlman,  and  Hunter  (1980)   indicates  similar 
results,   that  is,  no  differential  validity  between  groups. 
College  .Adminissioos .     In  the  area  of  college  admissions,  most  studies 
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have  looked  at  potential  bias  in  the  Scholastic  Aptitude  Test  (SAT)  as 
it  predicts  college  grade  point  average.     The  SAT  is  a  timed 

rnul  t  i  p]      rno  i  r  o  -'^^nd- oonr  j  ]    to.sf    that   rontnjns   two  main  oarts;  the 

in   the  SAT  has  been   suqqested   because  of  the  mean  difference  in  the 
performance  of  white  and  various  minority  groups  with  the  white  group 
scoring  higher.     The  importance  of  this  potential  bias  comes  from  the 
fact  that  most  selective  colleges   in   the  United   States  use  the  SAT  as  a 
criterion  for  admissions.     It   should  be  no ted ,  however ,   that  recent 
evidence  provided  by  Hardagen   (1981)    suggests  that  a  wide  range  of 
criteria   for  college  admissions  is  presently  Used   in  this  country,  much 
more  so  than   in  western  European  countries  who  depend  heavily  on 
admission  test  results. 

As   in   the  employment  testing  literature,  most  of  studies  conducted 
in  this  area  exa-nine  differential  validities   in  the  , per f ormance  of 
blacks  and  whites.     Those  stud  "  ^s  examining  validity  coefficients,  on 
the  whole,   report  no  differential  validity.     For  example,   in  an  (.»arly 
study  conducted  by  Stanley  and  Porter   (  1967)   comparing  the  validity. of 
the  SAT  in  predicting  freshman  GPA  in  three  black  and   1 5  "^predominen tly 
white  state  colleges  in  Georgia,   no  differences  were  found   in  the 
validity  of  the  SAT  between  races  when  used   for  this  purpose.  Yet 
conclusions  from  this  study  must  be  drawn  carefully.     A  floor  effect  in 
the  performance  of  black  students  on   the  SAT  was  found  and  as  Stanley 
and  Porter   (1967)    indicate,   the  test  was  too  difficult  for 
approximately  one--third  of  the  population  of  black  students.  In 
addition,   the  study  combined  heterogeneous  samples.     It   is  highly 
inferential   to  conclude  that  the  criterion  test  score  (i*e.,  GPA)  means 
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the  same  thing  across   institutions.      Stanley  and   Porter  (1967) 
cautiously  concluded   that   the  results  of  their   s^udy  suqqest   that  the 
use  of   SAT  scoies   in  predicting   freshman  GPA  was  as  valid   for  blacks 

attending  black  colleges  as  wh  i  tes  ,a  t  tend  i  ng   pre.dominentl  y  white 

\ 

coll eges . 

More  recent   studies  comparing   regression  equations  have  supported 
the  contention   that   the  use  of  the   SAT  in   predicting   GPA   is  not  biased 
against  blacks  when  a  white  or  conimon  regression  equation   is  used 
(e.g..   Centra,   Linn   &   Parry,   1970;   Cleary,    1968;    Davis   &   Kerner^Hoeg , 
1971;    Davis  &   Temp,   1971;    Kallengal,   1971;    Pfeifer   &    Sedlacek,  1971; 
Temp,    1971;   Wilson,    1970).     To  the  contrary,   a   trend    in  many  of  these 
studies   indica      s  that  bias,  when  evidenced,   was   in   favor  of  blacks 
(i.e.,   overpredicted  performance  on   the  GPA  criterion  when  using  a 
white  or  common   regression  .aation). 

In   a  review  of   the  homogeneity  of  regression  of  GPA  on   SAT  scores^ 
Linn   (1973)    concluded   that   in   22  studies  of   racially  integrated 
colleges  the  actual   GPA  of  blacks  was  overpredicted   in   18   of  them.  In 
no   instance  did   the  SAT  underpreaict  black  GPA  and   in  most  cases  the 
overpred ict ion  was  a   function  of   intercept  bias.     These  findings 
resulted   in  a  panel  of  the   .^^jnerican   Psychological    Association  to 
conclude  that   in  regular  college  programs  within   integrated  colleges, 
with  GPA  as  the  criterion,   the  use  of  standarized  tests  for  all 
practical   purposes  leads  to  comparable  predictions  for  black  and  white 
students  (Cleary,   Humphreys,    Kendrick  &   Wessinan,  1975). 

In  a  review  of  two  studies  (Goldman  &  Richards,  1974  ;  Goldman  & 
Hewitt,  1975)  comparing  the  homogeneity  of  regressions  between  white 
and  Mexican-American  students  using  the  SAT  as  the  predictor  measure 
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and  G[v\  as  the  criter     m  measure,  Jensen   (  1980  )    reports  that   in  both 
instaruM^r*   tho   f;AT  had   lower  validity  for   Mexican-American  students,  and 
in  one  study  (i.e.,   Goldman  &   Richards,   1974)   wiion  using  a 
white-derived   regression  equation,   there  was  a   slight   tendency  to 
overpredict  Mexican-American   GPA  .     This   study  also  indicated   that  the 
use  of   the  SAT  added   little   to  a  prediction  made   from  hiqh  school  GPA 
alone. 

School  ->Tes  t  i  ag  ,     Early  efforts  to  study  external   construct  bias  ignored 
the  area  of  school   testing.      However,    in  recent  years  more  attention 
has  been  drawn   to   the  use  of  ability  tests  to  predict  academic 
achievement.      Several   reasons   for  this  attention   have  been  offered  (See 
Chapter   1),  but  whatever   tb     Leas      ,    several   recent   research  studies 
have  been   the  result. 

Some  of   the  early  s .  nc,       ni^        v  ilicity  studies  reported  by 
Sattler    (  1974)    of   indivicuall  <^-tered   intelligence   tests  (i.e., 

the  Stanf ord-Bi net  and   the    -ISC)    supported   their   validity  for  samples 
of  black  children  as   well  as  whit^  children.      A  more  recent 
single-group  validity  type  study  was  conducted  by  Oakland   (1979)  who 
reported   the  validity  coefficients  of  a  variety  of  readiness  tests  in 
predictiD  j   -^^ores  on  several   achievement  measures  for  groups  of  black, 
white,    .^nd   M?x  ican-^^Hier  ican  preschool   children   from  middle  and  lower 
SES  backgrounds.     While  no   statistic  was  used   tc   examine  differential 
validity,    the  size  of  the  coefficients  suqgested   that  potential  bias 
may  be  occurring   in  the  use  of  readiness  tests   in  predicting  non-white 
performance  from  white  or  common  regression   lines.     As  pointed  out  by 
Reynolds   (1982),   the  lower  correlations   for  non-white  groups  together 
with  their  lower  mean  criterion  score,   suggests  bias  favpring 
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non-whites   in  predictinq  early  school   achievement.     However,  as 
discussed  previously,  sinqle-group  validity  studios,   at  most,  only 
allow  for   inferences  across  groups. 

In  a  study  of  the  differential  validities  of  seven  preschool  tests 
in  predicting  scores  on  the  Metropolitan   Achievement  Test   (MAT)  for 
samples  ot   black  and  white  prescliool   children,   Re>aiolds   (  1978,  reported 
in   Reynolds,    1982)    conducted  an  extensive  analysis  to  compare  validity 
coefficents  and  to  examine  homogeneity  of  regression.     The  MAT  was 
^administered  one  year  after  the  predictor  measures.     The  results  of  a 
total  of   112   validity  coefficients  revealed  a  less  than  chance  number 
of   significant  differences.      In  a   study  of  the  112   regression  systems, 
a   significant  bias  was   found  across  both  sex  and   race  with  racial  bias 
being  s iqn i f i can tl y  more  prevalent  than  sex  bias.     A  further  analysis 
of  the  data   indicated  the  bias  most  often  occurred   in   two  measures,  the 
Preschool    Inventory  and   the   Lee-Clark   Reading   Readiness   Test.  The 
Metropolitan   Readiness  Test   (MRT)    showed  no  bias.     When  bias  occurred 
it  always  acted   to  overpredict  black  male  performance  and  unde:Dredict 
white  female  performance  when  using  a  common  regression  equation. 

Several   recent  studies  have  appeared   in  the  literature  examining 
the  use  of  the  WISC  and  WISC-R  as  predictors  of  scholastic  attainment. 
Much  of   this  research  seems  to  have  been  spurred  by  public  concern  over 
a  disproportionate  representation  of  minority  children  in  classes  for, 
the  mentally  handicapped.     Many  of  these  studies  have  examined  the 
differential   validity  of  these  tests  i n ypr ed ict i ng  academic  achievement 
as  defined   in  standardized  measures  suc^ti  as  the  MAT.     For  example, 
Reschly, and  Sabers   (19  79)   compared  the  validity  of  the  WISC-R  in 
predicting  performance  on  the  reading  and  math  subtests  of  the  MAT  for 
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whites,  blac^Sr   Mex  i  ca  n~  AuTier  icans  and   Native   American  Papaqos. 
Consistent  wi  t'h  the   trend   in   thos;e  studies  examinr  1  above,   an  analysis 
of  the  reqressioh  systems  indicated  bias  resulting  in  an  overpred ict ion 
of  minority  performance  when  a  common  regression  equation  was  used. 
The  authors   found   the  bips,   for   the  most  part,    to  be  a   function  of 
differences   in  intercepts," 

Reynolds  and   Hartlage   (1979)    predicted   reading  and  arithmetic 
achievement   scores  using  both  the  WISC  and  WISC-R  as  predictors  across 
samples  of  blacks  and  whites.     No   significant  differences  were  found  in 
predicting   achievement   for   the  two  groups  using  Sifferent  regressions. 
In   a  similar   study  comparing   Mex ican-Amer ican  and   white  children, 
Reynolds  and   Gutkin   (1980)    found   the  WISC-R  performance   IQ  to  differ 
across  groups  in   its  prediction  of  arithmetic  achievement.  The 
difference  resulted    in  an  overpred ict ion  of  arithmetic  achievement  for 
Mex ican- Amer ican  children  usino  a  common  regression  equation.  There 
were  no  differences  across  groups   in  the  regression  equations  derived 
for   the  WISC-R  verbal   scale  and   full   b^de   in  the  prediction  of 
mathematics,   roading  and  spelling,   and   t he  per f ormance  scale  of  the 
WISC-R  in  predicting  reading  and  spelling. 

In  addition   to  studies  examining  ex ternal  \const r uct  bias   in  the 
Wise  and  WISC--R,    the  Stanf ord-Binet  was   ry6et^Wy  studied  (Bonard, 
Reynolds  &  Gutkin,   19R0)    to  determine^Tf  bias  exists   in   its  prediction 
of  academic  achievement   for  black  and  white  children.     The  results  of 
this  study    Lidicated  that  no  systematic  bia's   in  prediction  was   found  in 
an  analysis  of  both  the  validity  coefficients  and   the  regression 
equa  t  ions . 

Wh  i le  the  stud  ies  c  i  ted  above  have  all   been   i  n teres  ted   in  bias  in 
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individually  ndmin  i  .stored    into       qenco  tests  as   it  orodicts  a 
standardized  measure  of  academic  achievement^   a   few  studies  have 
focu55(,»;i  on  ctiterion  measures   that  are  more  global    in  nature^ 
encompatJs  i  nq  criteria   reflecting   the  child's  overall   performance  in 
his/her   role  as  student.     To   some,   such  criteria  are  viewed  as  a  more 
vdlu.ible   st,andaid   sinc:e   it   is   to   this   standard   that   intelligence  tests 
are  often  asked  to  predict.     Individual   tests  of   intelligence  are  often 
used   in  the  schools  to  help  in  making  decisions  regarding  special 
education  placement.     Clinicians  employing   the   test   for   this  purpose 
usually   infer   from  its  us'   not  only  a  child's  ability  to  perform  on  a 

estricted  type  of  academic  task  encompassed   in  a  standardized 
achievement   test  but  also  his/her  ability  to   function  effectively  in 
the   future   in  his/her   role  as  student.      It   has  likewise  been  agued  that 
both  an   IQ  test  and  a   standardized  measure  of  academic  achievement  are 
measuring   the  same  thing,    i.e.,   the  learned  ability  to  master  academic 
type  skills  and   to  perform  then  under   standardized  conditions  (Garcia, 
1981).     Mercer   (19       )    classified  such  measures  as  tests  of  school 
functioning  and  argues  that   if  one  wants  to  use   thii^   test  as  a 
predictor  of  school   functioning   then   it  should  be  related  to  more 
global   criteria   than  standardized  achievement   tests.  Mercer, 
therefore,  views  measures  such  as  the   teacher's  subjective  judgments  of 
the  child's  ability  to  perform  across   the  range  of  subjects  as  embodied 
in  a  report  card   to  be  a  more  useful   criterion  to  predict  to  when  using 
tests  of  school   f unct ioning^/i  •e . ,  WISC-R)  . 

While  much  prof essiopal   sen t iment  can  be  engendered   for   1)  the 
hypothesized   learned  nature  of   the  measured  construct  and  2)  the 
necessity  for.examing  the  utility  of  the  mea;>ure  for  the  purpose   it  is 
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intended    (i.e.,   special   eauc^1tion  diagnosis  and  placement),  wo  must 
conclude  that  t^    se  studies  are  more  appropriately  a  topic  of  bias  or 
unfairnes<^   in  utility  and  will   be  considered   in   Qiapter  h.     Whethor  or 
not  one  agrees  with  the  nature  of  the  construct  or   its   implied  oriqin, 
the  purpose  of  the  studies  reported   in   this  chapter  are  to  provide 
infonnation  on   the  validity  of   the   tests   in  measurinq  constructs.  It 
is  hypothesized  that  these  tests  measure  »a  construct  and  that  this 
construct   is  purported   to  be  related,  by  definition,   to  the  acquisition 
of  academic  skills.     To  help  validate  the  construct,   its  relationships 
to  that  criterion   is  an   important  step. 

If   the  predictor  and  criterion  measures  are  measuring   the  same 
thing   and   the  criterion   (i.e.,   standardized  academic  tesrs)  measure 
isn't  what   its'    purported   to  be  r   then  a  better  criterion  measurt^  should 
be  designed.      If   the  construct  is  not  what   it  purports  to       ,  then, 
again,   that's  a  conce;..!  that  needt'  to  be  es  tah'i.  i  shed  empxricaliy 
through  other   forms  of  validity  r-lating  to  thp  integrity  of  the 
construct.     The  relationship  of   the  ronstruct  to   the  outcomes  oi 
decisions  that  ar-?  made  with   its  use,  while  considered  herein  a  part  of 
the  definition  of  assessment  bias,   is  not   the  pur[    so  of  t.ie  body  of 
literature  reviewed   in  this  chapter. 
Copsidera tiops 

As   a   review  o^   the  studies   in   thi'3  section  shov/s  ,  the  more  recent 
invest  iqationr   in  aJl   tnree  areas  examined,  are  approachincj  the  study 
of  external   construct  bias  more  compi-ehens  i .      v  through  a  comparison  of 
regrep-ion  equations  across  groups  thar.  (    rlie-  studies  had  done.  This 
more  recent  trend  has  encouroged  a  flurry  of  investigations  Over  the 
past  decade  that  have  provided  rather  consistent  conclusions.  Biar 
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CM'rurs    ijOviuvMtly   ii)   ptocUct  ion  ,    and  wIhmi    it  cioon  ,    it    usually  results 
in   ihi^  ovnt  f^r  ♦  (Hrt  i  (Ml  of  minority  performance  whon  a  majority  or  common 

^'Mros.s.or:   oquation    i  r>   ns(ni   to   f>rr>clict    the  (criterion.  Si^veral 
explanations  have  br>en  offered   for   thi^  bias. 

As  we  discussed   previously,   one  of   the  effects  of  unreliable 
pM^dirtru;  moasurfvs   is   an    i-vrcasc^   in   the  difff^r^nce  hetwopn  intercepts 
1)    the  mean   porformanco  of   the  qroups  differ.      This   would  be   the  case 
regardless  of  whether   there  are  differences   in   the  reliabilities  or 
not.     We  also  pointed  out   in  our   review  that  when  bias   in  the 
regression  systems  occurred,    it  was  usually  of   function  of  intercept 
bias.      For   unre   lability  to  account,    in  part,   for   the  reported  bias, 
the   intercept   of  the  minority  qroup  would   have  to  be  below  the 
intercept  of   the  white  group,   with  overpred ict ion  occurring  when  a 
white  or  common  regression  line   is  used   to  predict  minority 
performance.     Such   is   the  case   in   the  studies  reviewed.     Hunter  and 
Schmidt   (1976)    have  suggested   that  as  much  as  half  of  such  bias 
witnessed    in   the  literatt^re  can         a   function  of   unreliability   in  the 
predictor  measure.     As   pointed  out  above,   such  error  can  be 
statistically  eliminated  by  using  estimated   true  scores  in  the  analysis 
as  opposed   to  test  scores. 

If   up  to  half  of  the  I    as  can  be  accounted   for  by  test 
unreliability  then  what  can  account   for  the  remaining  half?  Jensen 
(1980)   points  out   that   this  remaining  half  is  the  result  of  "the 
predictor  variable  not  accounting  for  enough  of  the  variance   in  the 
criterion  variable  to  account   for   the  major- minor  groups*  mean 
difference  on  the  criterion"    (p.    514).     Jensen  (1980)    also  concludes 
that  there  are,  at  present,  no  explanations  for  the  phenomenon  but  he 
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post  ul  .it       ,    cilonq   with  oth^M'S,   that    it    m.ty  b<»   a    function  of    ci  i  f  f  c^r  cncos 
ill    t  ho  (nit(^rif)n  measure^   fvuisoci   hy  r,nch   factors   as   ach  i  c^v  cnnen  t: 
motivation^    i  n  t  c  i  <  ^  s  t    ^   work  and   fitudy  h.ihit.s,    and   f)c  i  !iona  1  i  t  y  t.r<iits 
aifiH^tinq   p(?r  s  i  s  tonco  ,   emotJonal   stability,   and   sel  f -con  f  idonce  .  Any 
one  or  combination  of   those  may   influence  the  criterion  measure. 

We   would   like   to   further   point   out;   that   these  same   factors  may 
a  1  .so  account   for  (i  i  f  f  er  enctv^   in  mean  performance  on   the  predictor 
across  qroups  as   well.      If   such   were  the  case  and   these   factors  were 
uncorrelated  to  the  predictor  and   influenced   the  criterion   to  a  similar 
degree,    then  no  external   cons tr uct  bias   would  be  evident.      If  they 
influenced   the  criterion  measure  more   than   the  predictor,    then  bias  in 
thc^  diroction  observed    in   the  above   studies   would   result   if  che 
influence  of  such  factors  were  negative. 

It   should  also  be  noted   that  if   factors  such  as  those  identified 
above  did   influence  the  predictor  measure,    it  would  have  serious 
implications  for   the  construct  validity  of   the  test.     Additionally,  if 
these  factors  were   irrelavent   to   the  predictor,    it  would  not 
necessarily  Le   identified   in   the  study  of  the   internal   structure  of  the 
test  during  an  examination  of   internal  construct  bias.     The  reasons  for 
this   will   become  apparent   in   the  next  section  when  we  take  a  look  at 
methods  employed   to  study  internal   construct  bias.     In  addition, 
furthei-  'light  will   be  shed  on   this  problem  when  we  discuss  alternative 
approaches  to  the  study  of  construct  validity  in   Chapter  5. 

Lpterpal^Copstr uct^Bi  as 

When   looking  at  the  external  evidence  of  construct  bias  as  we  did 
in  the   last  section,   the  test   for  which  we  were   interested   in  judging 
bias  was  examined  as  it  predicted  to  some  criteria  to  which  it  was 
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{)urfx>rl«ul    to  h.'   r<'lntfM],     Wiwri  cXfiinininq   ov  icif»nc(»  of    intorn.jl  conr.truct 
biiir;,   no  r.uch  oxt(»rnol    ctitiTu^   aro  used.      Rather,   t.ho   rosponso  of 
(^r()Uf)s   aro  (>xaiTiin(H]   to  ci(^  t*r*  nn  i  mo   if   d  i  f  f  (»roru'or;   acror^.s  qroiips  riro 
evidi'ruMMl    in   th(^   striunuro  of    iho  rospon:..>  patterns  or    in  the*  specific 
items  that  make  up  the   test.     Criteria  of  the   sort  traditionally 
employed    in  evaluatinq  content  and  construct  validity  are  used  in 
makinq   ~)  ud  qmien  t  r»   r^:?qardinq    in   ornal    construct  bias. 

Sometimes   the  criteria   t   ad  i  t  ional  1  y  used   to  j  udqe  construct 
validity  are  external    (e,q,,   another  construct  measure).     Thus,  there 
is  often  quite  a  hit  of   similarity  between  criteria  of   this  sort  and 

\r\t  critc^ria  employee^   in  examininq  external  construct  bias. 
Conceptually,    this    is   accounted    for  by   tb'>    fact   that  predictive 
validity  is  but  one  aspect  of   construct  validity.     Whether  or  not 
investigations  employing  external  criteria  are   included  under  the 
traditional   heading  of  predictive  or  construct   validity   is,   to  a 
certain  extent,   arbitrary.     An  often  made  distinction   is  that  the 
criteria  used   to  establish  predictive  validity  are  a  set  of  behaviors 
more  often  acquired  by  one  who  possesses  more  of   the  construct  than 
less.      In   addition,   the  behaviors  are  usually  identified  as  more 
specific   in  nature  and  perceived   to  serve  more  of  a  practical  purpose 
in  determining   the  usefulness  of  the  test.     The  use  of  academic  skills 
as  criteria   to  validate  a  measure  of   intelligence   is  one  example. 

The  external   criteria  used    in   those   investigations  purported  to 
lend   evidence   to  construct  validity  are  usually  more  general   and  less 
practical    in  nature.     For  example,    the  validity  of  one  construct  is 
often  pa  tly  accompl  i'shed  by  relating   it   to  another  construct   to  which 
it   is  hypothesised  to  be  related. 
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For    the   [nirposor.   of   the  profjont    r(^vi(^w,    this  distinction    is  not: 
nitKli^ ,      Tho   f;tn(]y  of   bins  as  evirknicod   by  difforonct'S   in  qrouf) 
{)or  f  ()  rmanc-o  on  an  oxtornal   criteria  are  reviewod  under   the  headinq  of 
ext(»rnal   construct  bias.     Traditional    construct  validation  studi(?s 
employing   factor   analytic  procedures  are   included  as  part  of  internal 
construct  bias. 

Tiif   ii'cj.sun   lor    tliib  dibtinction    is   m   keepinq   With  out  perception 
of   this  purpose  of   traditional   validation  procedures.      From  ttiis 
perspective,   the  value  of  traditional  validation  procedures  is  in 
telling  us  how  well   a   test  measures  a  construct,   not   its  usefulness  in 
decision  making.     The  questiOii  as   to  whether  or   not  a  test   is  useful  in 
decision  making   is  very  complex,  much  more   so  than  can  be  answered  with 
predictive  validity  studies  of  a   traditional   variety.     Therefore,  the 
study  of   technical   test  bias  is  presently  perceived  as  answering 
questions  related   to  how  well   the  test  measures  the  construct  across 
groups,   not  how  effective  it   is   in  predicting  outcomes  of  complex 
decisions.'    Since  the  value  of  any  construct  rests  ultimately  in  its 
use  and  cannot  be  divorced   from  this  purpose,   we  also   include   in  our 
perception  of  bias  those  practices  that  result   in  differential 
treatment  across  groups.      This  discussion  will   be  taken  up  under  the 
topic  of  outcome  bias   in  Chapter  6, 

In   the  remainder  of  the  present  chanter,  a  number  of  common 
statistics  genera  ted   from  test   responses  are  exam  in ed   to  determine  the 
presence  of   internal  construct  bias.      Evidence  of  bias   in  any  one  of 
the  areas  reviewed  provides  grounds  for  labeling  a  test  suspect. 
Examined   in  the  following  sections  are   internal  consistency  bias, 
factor  structure  bias,   and   item  bias.     The  last  of  these  biases,  item 
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hicis,    is  I'Xciin  i  lU'd   -k       r  J  i  tiM   to   the    four    m(*tho(].s  fnonl    pof)ulcir]y  ♦MTif>loyocl 
in    i  t  .s  Mnaly5u^«'>;    i  hf*  cnoup  x    items    i  n  tor  cut  i  oris  motho(lr    the   i  t(^nn 
r  erilKon.Si*   theory  me  I  lux!  ,    th^.^  d  i  .st  rac  t  or   nnalysir,  method,    and  th(* 
iudqnvMitcil   molhoci.      In   .iddition   to   tlu^.sc*    i  srau^s  of    int(^rnal  con.struct 
hi  a  brief  discussion   of  what   is  called   "facial   bias"    in  the 

literature  will   be  included, 
L  n  t  n  n .  J  I    C  •  >  n    i ; u  uc\/       i  >  '  > 

One  statistic  of  a   test   that   can  he  examined  across  qroiips  to 
deternnine   if   there   is  evidence  of    internal   construct  bias   is  the 
intercor relations  amonq   the  test's   items.     This  statistic   is  a 
reflection  of   the   internal   consistency  of  a   test  and    in  measurement 
terms   is  ono    indication  of   the   test's  rel  i  r^hi  1  i  t  y ,      If    the  qroups^^  under 
i nves t iqa t i on  each  evidence  a   hiqh   reliability  coefficient   then  what  is 
beinq  measured   is  beinq  done  so  with  hiqh  accuracy  for   the  qroups  and 
no  bias   is  suqgested.     A  discrepancy  in  the  reliability  coefficient 
between   two  qroups  would   suggest  either    (1)    the   items  are  more 
difficult   for   the  group  with   the  lower  estimate  or    ,(2)    the  item 
intercor rel at  ions  are  different  or,    (3)    both   item  difficulty  and  item 
in ter cor r ela t ions  explain  the  difference.      If  differences   in  the 
internal   consistency  estimates  exist,    it  would   therefore  be  necessary 
to   find  out   if  they  can  be  accounted   for  by  item  difficulty  since  such 
differences  could   be   the   result  of   items  beinq   truly  more  difficult  for 
one  qroup  than  another.      If  such  were  the  case,   then   the   test  would  not 
be  biased-      If   item  difficulty  was  not  the  reason  for  differences  in 
the  estimates,   then  such  differences  can  be  attributed   to  the  item 
i nter correlations .     In   this  case,   the  test  would  be  considered  biased 
as  it  vsould  suggest  the  possibility  that  the  test  is  no  t  measur :  r  g  the 
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s.inw   tliiti'^   .HMosf;  (|t()iif)s.      An    intorn.il    r-oru;  i    tf  mu^  y  irli. ability 

coc'f  t  1  c  1  (Mit    of   c!f)pr  ox  i  Ti<i  t    1  y    .        if,  .ui   .arbitrary  r:*  ui^larc^   oft<»n   uned  to 

connotn   a   (]oo(\    t(»st    in   this  rc^qard. 

With  rospt»rt  to  thf'  WIS(N-K,  two  rr»(MnU  l.nqt^  .sral(»  i  n  vcs  t  i' |  <5 1  i  ons 
of  internal  connstency  wero  condnctod  by  Oakland  and  Feiqenbauin  (1979) 
and  Sandoval    (1979).      In   the  Sandoval   study,    internal  consistency 

lai, lor    t        variour;  W':;('    H   ';ubt"Sts      v;oro   romputod    for   over  ICoOGI 

whites,   blcjcks  and  Mc»x  i(  an-Amor  icans  .      s/ariations   in  estimates  across 
groups   ranged  upward         only   .04   except   for   the  object  assembly 
subtest.     On   this  subtest  differences   in  the   internal  consistency 
estimates   were    .16  between  whites  and  blacks  and     . 2W  between 
Mex  iran-Am(?r  leans   and   blacks  with  blacks  having  higher   estimates  in 
both  com par  1  sons . 

Similar   findinqs  are  reported   in   the  Oakland  and   Feigenbaum  (1979) 
study  using  similar  groups.     Differences   in   internal  consistency 
reliability  estimates  ranged  upward   to    ,06  for  all   but   the  object 
assembly  subtest.      In   the   Oakland  and   Feigenbaum  study.  internal 
consistency  reliability  on  the  object  assembly  test      »i    whites  was 
higher   than   for   Mex i ca n- Amer icans  and  blacks.      The  estimates  for 
whites,  blacks  and  Mexican-Americans  were   .74,    .64,   and  .67, 
respect  i vel y . 

With  respect   to  other   tests,   Jensen   (1974)    reports  estimates  of 
internal   consistency  for   the   Peabody   Picture   Vocabulary  Te^t    (PPVT)  and 
the  Raven's  Colored   Progressive  ^!atrices  for  similar  groups.  Estimates 
for   his   samples  on  these   tests  were  also  similar.     On  the  PPVT 
estimates  ranjed  from   .95  to   .97  across  groups  on  the  Raven ,  between 
.86  and    .91.      For  whites,  blacks,   and   Mex  i  ca  n-Ajner  icans ,   Gre0n  (1972) 


ERIC 


As  ses^mon  t   R  i  a  s 


ISO 


i.'ooilf.   sifnil.uly   liMli  ( •(  >r)s  i  ;i  t  ( t    r<.';;i)lts   raruiiDq    I  torn  h<>twi'tM)  . 

iiMil    ,^)j>   fc>r    si^c)r**s   on    tht»   C'<i  1  i  f  o  r  r\ i  <i    Ac  h  i      (*Tn  on  t    T'vit.  . 

In   ocicJition    to   t.lio    inter  n<il    cons  i  r.  t  oncy  ostimrntos  on   t-hi^  WISC~F^, 
OaklHtiil   <\im\    I'Vui  (^r\b;nmi    (P)7^M      1  r.o   report   ostim.ito*;    for  the 
Mencler-Ovitcil  t    'r(?st    for   L)larks,   whitejj  and   M(»x  i  ca  n~  Amor  i  cans  .  These 
estimate!?   suqqested   similar    internal   consistencies   ranqinq   from   a  low 
ot     .7?   for    M(^xicMr^   AmcTicMnr.    to    i   hiqh  of    .  R '1    for  whitpc;. 

In   anoth</i    r.tuMy,    |)»'cm    (1077)    r(»port(»ci   on   the    internal  consistency 
of    the  WIf;C-F^   for    Mex  i  ca  n- Amer  i  ca  n  chilxiren  who  had  been   tested  by 
wh  i  t€^  examiners.      In  comparing   these  estimates  to   those  reported   in  the 
predom inent 1 y   white   standardization   sample  oi    the  WISC-R  by  Wechsler 
(iM74),    f>^an   foinid,   albeit   hiqher,   similar   and   consistent  estimates. 

From   the   evidence   reported    to  date,    there  appears   to  be  no  markei: 
group  differences   in   the  average  degree  of  accuracy   in  measuring 
whatever   the   test  measures.      As   pointed  out  earlier,   however,   we  can 
only  infer   from   this  statistic   that   the  same  construct   is  being 
measured   across  groups.     To  be  precise,   this  measure   tells  us  that  the 
tests  are  measuring   the  thing  accurately;  whether   they  are 

measuring   the   same  or  d  nt  constructs  accurately  is  not  determined 

from  this  statistic.      In  addition,   this   indicator  does  not  tell   us  if 
an   irrelevant   factor  or   factors  are  differentially  affecting 
performance  across  groups  even    if   it   is  measuring   the  same  construct. 

Factor  ,5Itructure  .Bias 

Evidence  bearing  on   the   first  of   these   issues   (i.e.,   whether  or 
not  the  same  construct   is  being  measured   across  groups)    is  provided  by 
the  most  common  of  construct  validation  techniques,   factor  analysis. 
Sometimes  referred  to  as  factorial  validity  in  validation  studies,  this 
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t  i'cliii  I  ( |iH '    i<  l» '  tH  1  t  1  < '  .   ( '  1  n  f.  t  r  r  ■;    ui     i  t  <  in  s   or    .Mil)  t  « (  s    that    <  u  r  r  « •  1  .i  t  i» 
hi  *j  h  1  y   with       H  h   n  t  n<  •  r  -      It       '  .so    i  di^rit  i  f  i  r  s    t  hr)';< .    t  h.i  t    don't    fit     in  the 
i'lu?;tors.      Wh^'h   (i:i})loyi^)      in   t  nt^   stixiy  of    int:rrnnl    con   t  r  lu^t   bin!^^  the 
f  >«i  t  t  t  •  r  (1 ' .   ()  1    t  h» ' ! n  1 1  ♦  M  »^  t '  I  ,i  t  i  o  n  s  h  i  [>:;    . u  < »   s  t  lu  1  i  < h1   .  ic  r  os  *:   q  r  o u|  )S  for 
conci  r  iicr)(M  • .      It    i  .s   r»'^:;()mMi    th.it    t(»r»t*;   that    tiavr    Uffc^rcnt  factor 
structure  may  he  m(%:«.surinq  cHfirront  pr^yc  hoi  oq  ica  1   occurrencos  in 
respnrv.f*    t      \  hn    tfv«;t    i  t  fvn*;  .      Tost*^    that    have    similar    factor  structure 
are  not    eon  .s  i  ( 1  <^  r  ( '( i   i)ia'ioci   wi  tti   ri*r>pi»ct    to   th^TO  criteria, 

A  number   of    tec:hniqut?s  have  been   proposed   to  compare  factor 
analytic   findinqs  across  qroups.      Some  are  capable  of   testing  for 
sta  t  i  r.  t  i  r  a  1  1  y  significant  differences  between  qrou  :)5    (Jensen,  l^^^} 
Joreskoq^    1    7  1  )  ,   while  others   look    for   s  im  i  1  a  r  i  t  i  (vs   in   the  results 
(}ltum<iu  ,  Kat  /•Miiii'^yet   ^    St(>nner  ,    1077)  • 

In   ct   reanalysis  of   Nichols*    (1972)    data   that  originally  reported 
the   intercor relations   among    13  tests   (seven  of  which   were  subtests  of 
the  WISC)    for   large  samples  of    white  and  black   7  year-old  children, 
Jensen   (19R0)    found   no   significai,  differences  across  groups   for   the  g 

factor   loadings   (first  princir.   e  components  of   the  factor  analysis) 

4 

extracted   from  the   i n tercor r el  a t ions . 

In   a   reanalysis  of  data  presented  by  Mercef  and   Smith  (1972), 
Jensen   (1980)    again   found  no  significant     differences  among  white, 
black,   and  M^^'X  ican-Amer  ican  children  of  ages  7   and   Id  years  old  on  the 
q   factor    from  eleven  WISC  subtests.      However,   the  two   factor  solu4:ion 
using   a  varimax   rotation  of  the  principle  components   in  each  of  the 
three  groups  yielded  unclear   results^     The  verbal   and  performance 
factors   that  emerged  provided  mixed   results   that  Jensen  attributed  to 
sampling  error   (as  few  as  48  subjects  were  used   for  one  of  the  groups) • 
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However,  other   factor  analytic  studies  of  theWISC  provide  much  clearer 

evidence  of   similarity  of    factor   r>  t  r  i..ir  t  ur  e   for   the  WJ.SC  arrross  qroups 


Similar   ev  .1      nee  ha:    been    fo^.  mci   when   \ne    tost   ^^coros  on   the  WISC-R 
of  various  groups   were   factor  analyzed.      In  comparinq   the  factor 
structure  of  the  WISC-R  across  whites,  blacks.  Me  x  i  can- Amer  '  -^an  s  and 
Native   American   Papagos ,    Reschl y   (  1 9 78 )    found   substantial  congruence 
when  the  two   factor  solutions  were  compared.     When  using  the  three 
factor   solutions   i  .he  three  factors  comprise  Wechsler's   Verbal  and 
Performance  scalr^s  minus   the  Cbding,   Arithmetic  and   Picture  Completion 
subtest  which  make   up  the   third   factor   that   is  often  referred   to  as  the 
freedom  from  d i s t r ac tab i 1 i ty  factor),   Reschly  also  found  congruence 
between   the  white  and   Mexican-American  samples. 


A   further   analysis  of   Reschly' s   (1978)    data  by  Jensen  (1980) 
showed  no  significant  differences  among   the  four  groups  on  the 
principle  component.     Similar  evidence  continues   to  appear   in  the 


V/ISC-R  regardless  of   the   factor   analytic  procedure  used  ^   the  statistic 
employed   to  study   factor   structures'    similarities  or  differences,  the 
characteristics  of   the  sample   (i.e.,   normal   or   re f e r red)  '  and  the 
memberships  of   the  groups   that  are  studied    (Blaka,   Wallbrown  &  Engin, 
1975;   Dean/   1979;   DeFries,  Vandenberg,   McClearn,   Kuse,  Welson,   Ashton  & 
Johnson,   r;74;   Gutkin  &   Reynolds,    1980;   Oakland  &   Fei-^enbaum,  19*9; 
Vance,   Huelsman  &  Wherry,   1976;   Vance  &  Wallbrown,   1S78;  Wallbrown, 
Blaka,   Wallbrown,    Eng  it  ,    1975  ;   Wallbrov/n,   Blaka  &   Wherry,    1973  ,  1974). 
Not  only  has  this   finding  been  consistent  with  regard   to  the  WISC-R  but 
similarly  with   (1)    the  WPPSI  across  blacks   and  whites    (Kaufmann  & 


literature  regarding   the  simil.=irity  of  the   factor   structure  of  the 
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Hollenbeck,   1974;  Wallbrown,    Blaka  &   Wherry,   1973);    (2)    the  MRT  across 

hlarks   and   whit^'P.    (Rf^\nno\ds,    1  9  79);     {^)    the   Mcairtl-  ..   Scales  of 

19  7S);    and    {A)    the   Go  ocleno  uq  h- Ha  r  r  i  s   Drawing   Test  across  blacks, 
whites,    Mex ican-Amer leans ,   and   Native   American    Indians   (Merz,  1970). 
The  sum  of  these  .findings  leads  to  similar   inferences  regarding  the 
constructs  the  various   tests  purport   to  measure;   that   is,  the 
constructs  are  the  same  across  groups. 

Wh  i le  these   findings  provide  strong  conf i  rmatory  ev  idence  that  the 
constructs  these  tests  are  measuring  are  the  same  across  groups,  they 
do  not   indicate  that   the  constructs  are  being  measured  to  the  same 
degree.      Irrelevant   factors  that  are  not   intended  to  be   included   in  the 
measure,   yet  are  present   for  either  one  or.  all  groups,  are  not 
detectable  through  the  use  of   factor  analysis  procedures. 
Item  B  i as 

While  those  studies  that  have  been   reported   so   far   under  the 
heading  of   internal  construct  bias  have  all   examined  the  overall 
pattern  of  responses   to   identify  the   integrity  of  a  measures  construct 
validity  across  groups,   none  have  dealt  with  the  analysis  of  specific 
items  or   series  of   items   in  a  search  for  evidence  of  bias.     The  search 
for   item  bias,   the  oldest  p    I   '.ice  among  all   in  trying  to  ferret  out 
bias   in  testing    (see,   for  e:      pie,   Eells,   Davis,   Havighurst,   Herrick,  & 
Tyler,    1951),   is  primarily  designed  to  ensure  that  the  individual  items 
used   in  a  test,  contribute  equally  to  the  meaning  of  what  is  measured 
across  groups.     The  two  most  popular  methods   fo^    identifying  biased 
items  are  the  group  x   item   interaction  method  and  the  i tem- response 
theory  method .  .  . 
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Group^x ,L tern  winter acta OQ^ Met bod  >     Approaches  within  the  group  x  item 

interaction  method  usually  apply  either  analysis  of  variance  of 

bic-is   Kirouvjh   t:ho   use         an   analysis  c.    voiiancc   desi^Hp   the  qrOwP  ' 
item   interaction   term   is  of  major   interest   (Cardall  &   Coffman,    1964  ; 
Cleary  St    Hilton,   1^68).     Such  an  interaction   is  an  indication  that  the  ..^ 
items  are  exacting   in  different  ways   for  different  groups.      If  suc^ 
were  the  case,    it  could  be  concluded   that   the   items  may  not  mean  the 
same  thing   for   the  various  groups  under  examination.     A  similar  effect 
can  be  noticed  by  correlating   item  difficulties  for  different  groups. 
The  correlation  between  the  rank  order  of   item,  di fficul ties  or 
decrements  across  groups  will   be' low  if  the   items  are  biased  (Jensen, 
1976).     Once  such  biasing  effects  are  found,    it  then  becomes  necessary 
to  pinpoint  those  items  that  are  the  most  biased  ones.     r  variety  of 
procedures  have  been  ,.o  f  f  er  ed   to  conduct  such  analyses   (Angoff  &  Ford, 
1973  ;   Angoff  ^    Sharon,   19  74  ;   Veale  &    Foreman,  1975). 

The  use  of   the  ANOVA  technique  for   identifying   item  bias  has  met 
with  the   identification  of  only  small  percentages  of  performance 
variance  being   accounted   for  by  the  group  x   item   interaction.  For 
example,   approximately  2%  to  S%  of  the  variance  in  WISC-R  performance 


is  accounted   for  by  this   interaction  when  the  responses  to   items  of 
black  arid  white  children  are  compared    ( Jense ,  ;i9 76 ;  Miele,  1979). 

Correlational   procedures   likewi-se  have  produced  evidence  of  little 
item  bi'iS.     For  example,   rank  order  of   item  difficulties  across  groups 


of  iteri  difficulty  amohg  white,  black  and   Mexican-American  samples  of 


children   for   the   Raven's   Progressive  Matrices,   PPVT,   and  WISC-R  have 


has  resulted   in  consistently  high  correlation. 


Rank  order  correlations 
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all  been  of  the  .95   to   ,99  magnitude   (Jensen,   1974,   1976;  Sandoval, 
1^79)  •     A  similarly  hiqh  rank  order  correlation   is  reported   for  item 

i  n  t.c  1  i  iq  oi  K";-  \q    (  J    i  i    i.-r;      1 7  b  ) 

Jensen   (19  76  )    also  proposed  a  rank  order  v^orrelation  of  the 
Jecrefnents   instead  of  the   i  tern -di  ff  icul  ties  to   identify  bias   in  items. 
A    decrement   is  the  difference   in   the  difficulty  indices  of  two 
adjacent   items  when   the   items  have  been  ranked   for  difficulty  within 
groups.     Such  procedures  have  produced  correlations   in  the  upper  .90's 
for   the   Raven's  across  white,  blacks,   and   Mexican-American  comparisons, 
.respectively,   for  the  WISC-R    (Sandoval,   1979).     Lower  rank  order 
correlations  of     deer  ements  ranging  from   .65   to   .79  were  fou^d  across 
groups   (black,  white  and  ^tex ican- Am er ican)    for   the  PPVT.' 

If   there  has  been  any  common  finding   in  the  item  x  group  , 
interaction  studies  other   than   the  fact   that   the  bias   found  appears  to., 
be  very   small    (2%  -  5%  of   the  variance),    it   is   that  the  more  unreliable 
the   item    (usually  the  more  ambiquous   items)    the  more  chance  the  item 
will   turn  up  biased.     One  thing  that  has  not  been   found   in  these 
studies   is  a  consistent    theme  in   the  content  of  biased   items.  As 
Flaugher   (1978)    pointed  out,   it  was  the  early  hope  of  researchers  ih 
this  area   that  such  themes  could  be   identified  and  systematically 
eliminated   from  tests.      However,   such  has  not  been  the  case  when  we 
judge  item  bias  using  group  x   item   in ter- interact  ion  approaches.  The 
only  thing  that  is  accomplished  by  expounding  a  test  of   its  biased 
items   is  to  make   the  test  more  difficult  for  all  groups  since  such 
items  tend  to  be  the  moderate  to  easy  items   in  the  test  (Flaugher  & 
Schroder ,   1978 )  . 

168 
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Ltem^Response^Tbepry^fdetbod .  Approaches  for  detecting  item  bias  within 
the   i tom- response  theory  method  are  relatively  new  and  statistically 

An   itenrt  char  acter  i  ^>  t  ic  curve  depicts   the  relationship  of  the  ability 
level   of  the  test  taken  with  the  probability  of  a  correct  response.  If 
the  same  construct   is  beinq  measured   for  all  groups  studied  then  one 
would  expect  this  relationship  to  show  no  differences.  Various 
i tem- response  theory  models  can  be  applied   for  this  purpose  ranging 
from  a  more  complex   three  parameter  logistic  model    (Lord,   1977,  1980) 
to  a  simpler  application  of  the   Pasch  model    (Durovic,   Note  1;  Wright^ 
Mead  &  Draba^   Note  2).     Lord   (1977)   argues  that  the   i  tem-- resoonse 
theory  method   is  more  appropriate  than  the   item  x  group  interaction 
method   for  detecting   item  bias. 

While  resea     h  using   i tem- response  theory  approaches  have 
typically  yielded  results  similar  to  the  group  x   item  research,  biased 
items  with   interperable  themes  have  recently  been   identified  (Cole, 
1981),      Scheuneman  (1979),   for  example,   reports  that  negatively  worded 

r  , 

and  unfamiliar   format   items  appear  biased  against  black  youngsters. 
Cole  (1981)    concludes  that  further  research  in  this  area  is  needed 
given  the  only  moderate  agreement  on  which  items  in  any  given  test  a.re 
biased  across  samples  and  which  are  biased   in  the  same  samples  when 
different  methods  are  used. 

Distractor  .^Analys  is  ^Method .     A  third  method   for   identifying  bias  in 
items   is  through  the  analysis  of  d istr actors .     Distractors  are  the 
incorrect  responses  provided  as  possible  alternative's  in  items 
employing  a  multiple-choice-  format.     For  a  test  to  be  unbiased  with 
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respect  to  its  distractorSr  the  incorract  aicernative  should  have  th'? 
same  relative  degree  of  attractiveness  actos"  groups.      In   a  study  of 
t  w     noonl,-?'    rn  i"  }  t:  i  pi      ^vhc.  ■  r'p    in  t :/ ]  M  r;  ^Mir^r  i-h-    PVVT  r.nd  t-he 

Urwi'u'i^   l-,o-.roi..-  M.:v.i}c;.G,  ;oi.    (  '  m  V  -  )    iu,..iv1   cr: ).  .j :  s   on    viher    ['^'VT  t  c 

be   nonrandomly  d  i  :::tr  ibuted   on  many  ite:ns;   but  unexpectedlv  he  also 
found  significant  differences  between  blacks  and  whites  in  their 
choices  of  responses  on   26%  of  the  items.     On  the   Raven's,  a 
significant  difference  in  the  type  error   in  choice  of  distractors  was 
found  between  blacks  and  whites  on   12%   of  the  items. 

Further  analysis,  however,   revealed  this  difference  to  be 
determined  not  as  a  function  of  item  difficulty  but  rather  to  be 
age-related.     In  an  analysis  of  the  groups '  ,  respon-ses  to  the  Raven's, 
all  cases  where  potential  bias  was  evidenced,   showed  that  the  black 
children  responded  similar   to  white  children  approximately  two  years 
younger.     When  white  children  in  the  third  and  fourth  grades  were 
compared  to   fifth  and  sixth  grade  black  children,   the  difference  in 
choice  of  distractor   largely  disappeared.     Jensen   (1974)  concluded, 
therefore,   that  the  systematic  error  is  consistent  with  our 
understanding  of  the  underlying  construct  and  consequently  not  evidence 
o  f  -fe'  i  as. 

^  Judgmental -Method .     One  final  method   for  identifying  bias  in   items  will 
be  briefly  mentioned.     This  method  typically  employs  the  use  of  expert 
judges   in  detecting   item  bias.     While  this  method  has  been  adopted  in 
recent  years  by  several   test  developers,   its  practice  has  never  been 
empirically  justified.     To   the  contrary,   the  procedure  has  been 
demonstrated   in  several   investigations  to  be  no  better  than  detecting 
test  bias  than  through  Y:andom  selection   (Jensen,  1976;   Plake,  1979> 
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Sandoval  &  Miille,  1979).  In  the  Sandoval  and  Miille  (  1979)  study,  for 
example,  qroups  of  black,   white,   and   Mexican-American  college  students 

werc'   ns.<p-'^   to   judqo  WT.'^C'-R      t^Mns   to  dotormino  how  easy  the    it^-Tis  would 

indicated   tnat   none  of.    the  qroups  of   iudges  were  able   to   identify  which 
items  were  those  that  had  been  empirically  determined   to  be  either  more 
difficult   for  blacks  and  Mexican-Americans  or  of  equal  difficulty  for 
all   children.      Similar   results  were  found  by   Jensen   (1976)   when  expert 
j  ud  g  e  s  we  r  e  em  pi  o  ye  d  . 
Eacial ^Qias 

The  use  of  judges  to  determine  a  type  of  bias  referred   to  as 
facial  bias  has  been  proposed   in  recerit  years  (e.g.,   Anastasi  ,  1976? 
Cole  &  NithOr   1981)  .     This  type  bias  should  not  be  confused   with  those 
efforts  discussed  above  to  help   identify  item  bias.     Judgmental  methods 
for  detecting   item  bias,  when  employed,   are  done  so  to   improve  the 
validity  of  a   test  and   thus  help  reduce  bias.     The  notion  of  facial 
bias  has  nothing   to  do  with   the  validity  of  a  test.     Rather   it   is  a 
form  of  bias   in  that   it  offends  certain  groups  of  people  or  creates  a 
perception  of  validity-based  bias.     Cole  and   Nitko    (1981)  note: 
Facial  bias  would  occur  when  partici:lar  words 
or   item   formats  appear  to  disfavor  some  group 
whether  or  not  they,    in   fact,  have  that  effect. 
Thus,  an   instrument  using  the  male  pronoun  "he" 
throughout  or   involving  only  male  figures   in  the 
items  would  be  f ac i a  1 1  y  bi ased  v;hether  or  not  such 
uses  affected  the  scores  of  women. 
The  examination  of  .a  test  by  judges  to  remove  items  containing  facial 
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bias  are  usually  employed  for  socio- pol  i.tical  purposes  or  for  purposes 
of  principle  or  values   (Cole,  1981). 


Evidence  from  a  review  of  the  studies  of   internal  construct  bias 
lead   to  the  overall  conclusion  that  little   if  any  bias  can  be  found  in 
the   internal   structure  of  many  of  the  popular  psychological  tests 
commonly  employe^d   in  decision  making.     However,   this  conclusion  may  be 
pr/emature  with  respect  to   item  bias.     As  noted,  given  the  inconsistency 
of  the  results  across  and  within  the  various  approaches  for  detecting 
item  bias,  more  research  is  ne^jded   to  determine   if  themes  can  be 
identified   that  typically  bias  sets  of   items  for  one  or  more  groups. 
Although  recent   f  i  nd  i  ng  s  po  i  n  t  to  this  possibility,   ♦'he  prospects  for 
findings  of  any  practical   significance  does  not  look  very  promising. 
Given  our  earlier  review  of  external  construct  bias  combined  with  our 
present  review  of   internal  construct  bias,  one  would  have  to  conclude 
that  thci  search   for  what  Cole  (1981)    calls  a   "bias  bombshell"  has  just 
not  turned  up  anything;   and  prospects  for  its  location   in  the  future 
while  restricting  our  hunt  to  the  parameters  defined   in  the  technical 
test  bias  literature  will   likely  provide  us  with  more  of  the  same 
information  . 

One  question  regarding  the  validity  of  tests  in  measuring  their 
respect ive  cons tr ucts  that  is  not  answered  by  the  technical   test  bias 
literature  is  whether  or  not  differential  group  mean  performance  is 
influenced  by  factors  that  are   irrelevant  to  the  construct  being 
measured.     As  pointed  out  previously,   if  motivational  or  emotional 
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factors,   for  example,   influence  performance  differently  across  groups 
then  marked  differences  can  result   in  group  means  and  go  undetected  in 
this   liter, jture.      The  criteria  examined   above  has   informed   us   that  the 


to  be  oiasec:  aqainst   minority  qrouf)  member::^^  and  may   in   tact  be  biased 
in   th^^ir    favor.      In  addition^   the   literature  has   also   told  us  that  in 
most  cases  the  tests  appear  to  be  measuring  the  same  constructs  with  a 
high  degree  of  accuracy.     VJhat  the  literature  does  not  tell   us   is  if 
there  are  any  situational   factors   such  as  self-confidence,  achievement 
motivation  and   the  like,   that  are  differently  influencing  the  measure 
of  that  same  construct  across  groups.     This  topic   is  taken  up  in  the 


nex  t  chapter  . 


\ 

» 


173 


170 

Chapter  5 

Situational  Bias  in  Psychological  Assessment 


Flnbor.^    ion  ^'f   thr    -onr^nt   of  a  ssc  ;..s-)ont   bias  nnust  tako 

discuss^-  '    m   orC'V\Oi^r\   ch.iotvrs.      ]n  chanter  r   Wv^  r^'view 

some   factors   that  have  been  conceptualized   as  potentially 
contributinq  to  assessment  bias   resulting   from  factors   in  the 
external  .testing. situation.      These   factors  are  conceptualized 
as  seoarate   from  the   items  ner   se,   and   include  the  following 
areas:      (1)    tes t-wi senes s   (e.g.,  practice  efforts,  coaching), 
(2)    sex  of  examiner,    {?>)    race  of  examiner,    (4)  language 
factors,    (5)    expectancy  effects,   6)   motivational    factors,  (7) 
situational   considerat ions     and    (8)    scoring  considerations. 
F-:ach  of   these  areas  are  discussed   within  the  context  of 
met hodol oq ica 1   and  conceotual    issues   raised   in  the  area. 
Also,   some  areas  are   specified  for   future  eT.pirical  work  in 
the  assessment  area.      Finally,    some     -^ntati/o  recommendations 
are  advanced   for   practice   in  the  area;   especially  with  regard 
to  psychological   and  educational   assessment  of  children. 

Test-Sophistication 
Test-sophistication  or,  test-awareness  refers  to  a 
potential   source  of  bias  when  different  persons  participating 
in  testing  have  different  amounts  of  coaching  or  practice 
prior   to  taking   the   test.      A  num))er  of  authors  have  discussed 
issues   in   this  area    (e.g.,   Anastasi,    1<^R1;   Jensen,  19B0; 
Messick,   .1981)  .      Test   soohi  st  ica  t  ion   is  not  at  all   a  straight 
forward  concept.      Indeed,    issues   in   this  area  are 
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characterized  by  considerable  controversy.    Research  in  the 
qenoral  rlomain  has  focused  on  both  Practice  and       achina  (or 
trainir.i)    effects.      Jonspn    (19R0)    has  discussed  both  of  the^so 


Tract  ice 

Jensen    (19.^0)    defined  oractice  as  "takinq   the  same  or 
similar   tests   two  or  more  times  at  vari~^us   intervals,  without 
any  implication  of  soecial    instructions  or   specific  coachinq 
in   test   takinq"    (p.    590).      Based   upon  other   literature  in 
this  area    (e.q.,   Jarvis,    ]953;   Vernon,    193R,    1954a,  1954b, 
1960;   Wiseman  &   Wriqley,    1953;    Yates,    1953),   he  advanced  12 
conclusions  on  the  effects  of  practice   (Jensen,   l^RO,  pp. 
590-591 )  : 

1.  Practice  effects  are  natrually  greatest  for  naive 
svibjects,  that  is,  those  who  have  not  been  tested 
before . 

2.  F^etestinq  of  naive  subjects   on   the   identical   test,  after 
a  short   interval,   shows  gains  of  about   2  to  R   10  points 
for  various  tests,   averaging  about   5   TO  points. 

?gardless  of  the  tests  used   in  the  various  studies 
reviev/ed  here,  gains  are  converted   to  a  scale  with  = 
15,   which  is  the  u^aal         for  TO.) 

3.  There  is  considerable  variability  in  practice  effects 
among  individuals.  Bright  subjects  tend  to  gain  more 
from  practice  than  dull  subjects. 

4.  The  curve  of  practice  gains   is  very  negatively 
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accelerated   with  repeated  practice;   that   is,   there  are 
rapidly  diminishinq  returns  of  repeated  oractice  on  the 
samo  or   similar   tests,   yet  sliqht  qains  have  been  shown 

i  i,n' t.  i  I ' ?"    m     ov  c        t  .       Pracrj.  ce  q^'-tj   :jc' I '  u?c-i'j   t  Ik*  lirsl 
anc   second   test   exorrience   is   usally  as  great  or  qreater 
than  the  total  of   all   further  gains   from  subsequent 
practice  trials. 

For  naive  subjects^   aoe  makes   little  difference   in  the 
amount  of  practice  effects.     There  are  more  examnles  of 
large  practice  effects   in  young  children,  however, 
simply  because  fewer  of  them  than  of  older  children  or 
adults  have  had  prior  exPeri.ence  v/i  th  tests. 
Practice  effects  differ,   on  the  average,    for  various 
types  of  tests,   showing   the  smallest  qains  for 
information,  vocabulary,   and  verbal    tests  generally  and 
the   largest  Qains   for  nonverbal   and  performance  tests, 
probably  because  the  materials  of  the   latter   tests  are 
less   familiar   to  most   subjects  than   are  verbal  and 
informational   questions  . 

Practice  effects  are  qreater   for  tests  compr  i  sed  of 
heterogeneous   types  of   i  terns  than   for   homogeneous  tests  . 
Practice  effects  are  about   10  to  25  percent   less  for 
unt  imed   tests   than  for   speeded   testa . 
For  naive  subjects,   practice  qains  are  greater  on 
group-adm  in  i  s  ter ed  paper- and-penc  il   tests  than  on 
individually  administered  tests. 
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10.  Practice  effects;  show  surprisingly  little  "transfer  of 
traininq/'  with  the  qradient  of  practice  qains  fallinq 
off   steeply   from   identical    tests   to  parallel   forms,  to 


St  .in  for  d-P.  1  not  ,    for   exannple,    '  >   only   2   or  points. 
Parallel    forms  of  groups   tests   show  average  practice 
qaitis  of   3  to  4   ooints  after   one  practice  session  and  S 
to   f)  points  after  several   practice  sessions.  One 


over   the  course  of  eight  parallel   forms  given  to  London 
school   children   (Watts,   Pidgeon,   &   Yates,  195?). 

11.  Practice  effects  are  not  appreciably  diminished  by 
improving   the  u   jol   test   instructions  or  by  qiving  sh-Ort 
practice  tests  on  easy   items   prior   to   the  actual  test. 
There  seems   to  be  no  substitute   for  taking  an  actual 
test   under   normal    test  conditions   for  a  practice  effect 
to   be  manifested. 

12,  The  practice  effect   is  quite   lasting;   about  three 
quarters  of  the  gain   found  after  pne  week   is  maintainend 
up  to  six   months,   and  half  remains  after  one  year. 

The  conclusion   reached  on  the  basis  of  work   in  this  area  has 
been  that   practice  contributes  very  little   to  bias  in 
individual  or  group  differences   in  test  performance. 
Howover ,   whether   or  not   'is  can  be  an  acceptable  general 
conclusion  is  debatable.      Several    issues   in  this  regard  are 
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advanced  at   the  end  of  this  section. 
Coach  i  nq 

In  c    itrast   to  Practice^  coachina   refers   to  an  active 


providing    instructions    in   how  to   t.ike   the   test,  Tioaelinq 
actual    or   similar    items,   oroviclinq   strateqies   to   formulace  a 
response,    and   providing   feedback  on   the   tester's  performance. 
Actually,   any  tyoe  o-    intervention  could  be  implemented 
within   the  coaching  paradigm. 

Again,   Jensen   (1OR0)    has  advanced  several  conclusions 
i      n  empirical   work    in  this  area: 

1.  Coaching    is  quite   ineffective  unless  acc   ^o.mied  by 
practice  at   taking  complete   tests  under   requlc  test 
conditions.     According   to  Vernon,   the  leading   expert  on 
the  topic,   "coaching   without   practice   is  singularly 
ineffectiv    ,    renardless  of  how  protracted    it  is'* 

p.    131)  . 

2.  The  typical  gain   from  several   hours  c:    coachina  plus 
Practice  gain  on  a   similar   test   is  about  9   TQ  points,  or 
a  coaching  gain  of  4   or  5  points  over   and  above  the  gain 
due  solely  to  the  practice  effect  of   taking  a  similar 
test  once  or   twico  previo-usly. 

3.  The  coaqhing  effect   is  greatest   for  naive  subjects  and 
diminishes  with  orior   test-taking  experience.     Even  with 
equal   prior   testing  experience,   ♦'here  are  substantial 
individual  differences   in  gains  from  coaching;  the 
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modoratc^ly  briqht    t       i    to  qrHn   fno'-^t  . 

Coacliinf]   qnins   cir^'  qi't^Titpr   on   nonvf^rt).)!  ciqci 

per  f  orma  nee  -  tyf)C    i  t  om:.   than  on   verbal    a  nd    i  n  f  o  rma  t  ion 

items.      Also,   numerical    roasoninq   and  arithmetic 

probl  CUTIS   are  more*   suscc^ot  ihl  e   to  coachi  nq   qa  ins    than  ar^^ 

items  based   on   verbal    knov;ledqo  and  reasoninq, 

Aqe   and   sex   show   no  consist,    it    interaction  with   coachi nq 

e  f  1     '  t  s  , 

Thc^  ^j[fcc\'y^  of"  coach  inq   ^.  i  v  hiqhly   sr^K^ific,   with  little^ 
tr.'jnsfer    tc)   other    types   oT    rests,      nd   at   t  im^s   therr^  is 
even  nega  t  i  ve   transfer   to  dissif^ilar  tests. 
The  maxiiTium  effects  of  coachi  nq   are  achieved  ouickly; 
further   qain  does   not    '   'Sult    from  coachinq  nrolonqed 
beycnd   the   firs^    f^^w  hours.      One   study   found   three  hours 
to  be  opt  ima 1 . 

A  study  of  educationally  d isadvantaqed   children  in 
Israel    found    chat  coaching  on   a   nonverbal  intelligence 
test   substantially   improved   the  test's   validity,  that 
is,   correlation  with   teachers'   mai'^s   and   with   the  Verbal 
10  of   the  Wise    (Ortar,  1^60). 

j'ho  effects   of  coaching   seem  to   fade  considerably  faster 
loan   the  effects  of   nrac   ice  per   se.     A  study  by  Greene 
(1928)    shows   the  decline  over   time   in  the  qains  on 
Stanf o rd-Bi net    10  from  coaching   children  on   the  very 
same  test   items  or  on  similar   items;    tu  control 
chilv^^        were  tested   at   the  same  times   as  the 
e^per    nonnta 1  groups ,  but   they  were  nev  or  coached ,  and 
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:;o   thi^ir   q.iinr    ri^r^c)iu^n\    mVl  y   nroctico  of  foots. 
(JOtKSCMJ  ,    p(>  .    S^i  -         )  . 

Tos;  t  r»  opl)  i  s  t  i  c   t  i  on  .  n  nd  -  T.  n  t  -  •  rac  t  i  on  ,  wj.  t  h  J^^ce  _^n^d  -^Social 

Tho  offoctsi  of   oractiot:'  nncV  coacbinq   have  boon 
jnvostiqated  on  both  racial   ana   sooial   class  dimensions 
(e.q.r   Bauqhman   i<  DahlsdroT:,  Dugin^   Osborn,   &  Winick^ 

Dyer,    1^70;    Costolio,    1^''>"^^;    Turner,    Hall,  Grimmett, 
Tho:^:--   s^mkI^os,    r  -  •  i  o^.^  'Jensen    (19?^C^)r  qen.-rally 

shov;  neariM'o   results  o[!   t  ao   iut'.  ract__ion  of   practice  and 
coarninq   effects   wich  fiinrs    r  ice  or   social   class.     However r 
some   smal]    effects   v.-rc   noi    d    m   the   Dyer    (1970)    study,  but 
this  was  con^] -j.M- f-d   on   a   ri-^ileqe  sample.     Thus,    f-^P^  this 
limited   literature  c^ne  Cc;n  conclude  that  rhinimal   eff^^cts  of 
practicp   and  coaching   muV'  a'>'^ear. 
Cons  ider a  t  i  ou . 

Test   sophistication   represents  an   important  variable  in 
any   tei.tinq   and   ultimately   influences   the  validity  of  the 
tests  I'^^ployed.     F^efore  any  firm  conclusions  can  be  made  in 
this  area    it   is    important   to  consider   several  issues, 
includmr?   the  concon  t  u  a  1  i  za  t  i  on  or   test  sophistication  an- 
tho   potential   are. 5   that  can  be  used   for    intervention    in  this 
area.     To  beqin  with,   test   sophistication  does  not   refer   to  a 
set  of  homogeneoui        -tors,     The'^natur^  of  what  test 
sophistication   is  or  what  effects   it  has  cannot  be  addressed 
in  the  abstract,  but  rather  depends  on   the  specific  features 
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Ml 

mako   uf>   i.his  construct.      R(>latccl  to   those   issues    is  'h- 
ty:^i>  of  <)ctu.')l    1  n  t  o  rvi^ii  t  i  ons   t.h.it   nro  r^inoloyocl.      As   Anast^i'  i 
(1901)    notos,   different   kinds  of   interventions  will  have 
different  effects,  con  seouences  ,   and  -im  pi  i  ca  t  i  or\s  on  test 

pr' r  M>  r'-^i  UK'<  •  .      It    i:>  (|oneral.]y  assurri'^d    that  most   chijiirvi  in 

c 

Arnoricjn  culture  have  had   extensive  exposure   to   s  ta  nda  rr^  i  7ed 
forms  of   testing.      Indeed,    it   is  often  assumed   that  children 
exper  i  one  i  nc)   problems   in   school   will   have  more  experience 
with   tt-^sts   than   those  not   havinq  problems,      Jensen  (1PR0) 
not(^:;  : 

Since   t'ue    lOSC^s,   vii.'tual]^'  all    children   in  the 
public   '3cr    n'.s   have  been    increasingly  exposed  to 
s  ta  nda  r  d  i /.ed   scholastic   aot.tude  and  achievement 
tests,    from   the   f:>rimary  grades   through   high  school 
and   college,   so   that   exceedingly   few  pupils  iy  age 
Ifl  or   so  could  be   regarded   as   naive   in   respect  to 
tests.      ^^ec  ause  of   the  conce\      jf   teachr-rs  and 
parents,    the  least  able  pupils  or   those  with 
special   learning   r)roblems  are  apt  to  be   tested  the 
most,   especially  on    individual    tests  given  by  a 
school    psycnol  oq  ist  ,      'P'.errfoic,    it   seems  mc-st 
likely  that    in   the   oresent  day  very  little  of  .the 
variance   in   standardised   aptitude  or  achievement 
test   scores  can  be  attributed   to   individual  or 
group  differences   in    test   sophisticatioPr  with  the 
exception  of  recent    immigrants  and  oersons  who  have 
little  or  no   formal   school  inq  or  who  have  gone  to 
( 
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quitf'   .itypicil    f,rh()ol       fn.  SMI). 

v<T  t  h' ^  ]         ^    \\\>'  fl'/'or     ■    t  (;   v;hi(^h  cli  i  1  <  mmmi   Ivjvc^  nrr^vion.s 
t^xpot  i  iMicos   with   iruiividual    i  n  t  o  1  1  i  q  ^^nce   testr>  may  vnry  and 
brief   or  i'-Mitat  1  ("^n   sessions  may  be  cffiM:tivi'   i  \  equal  izina 

t<>r,\     ''^n'  ]r-,\t  W)P  ..       'f'o  fif'Mrf'''    '  Dr « 'CX  i    t- i  rvi 

f  ler  (?nc^e5;  (\)n   bo   rocKirecl    throunh   test   or  ic^nta  t  i  on  ,    a  more 
v'cilirl  inoa5-;uro   should   be  obtained    (Anastasi,    l^Rl).      Also,  as 
Jensc^n    (I'^'^'^M    notes,   coachinq   and  orac:ice  may  helo  "eoualize 
test    sophistication   a^onq   persons   with  differinq   amounts  of 
r)<i  s  t t 'Xf^(  •  r  i '  >  ru'»-    in    trjkinq   stand  a  rd  i  ::rd   tests  or    v/ho  differ  in 
the   receney  of    tests"    (p.    596).      Thus,   validity  may  actually 
be  enhanced    to   the  deqree   that    test  sophistication 
differences  are  minimized   or   eliminated.      However,  this 
should   be  done  when  evidence  has  been  qathered  thfit 
individuals   or   groups  have   little  or   no   test  sophistication. 
Methods  for  doinq   this  are  discuss-:^''   later    in   the  report. 
Also,    in  cases  of   severe  problems   with  orientation   to  a  test, 
more  specific   in  lervent i ons  mav  be  necessary  (see  later 
d  i  ^cuss  ion)  . 

Anothc^r    is^uie   related   to    inter  'ont  ions  on   tests    is  that 

tb^'      ■i^H-t  of   training  may  bt-   s^.  ecif       to  the   ^kills   us^  i 

J" 

fiurinq    trriinino.      In    some   resoects    r  ;ie   training   conducted  on 
test    items   is  very  similar   to  many   intervention   programs  in 
beaavior   therap^  or   behavior  modification  where 
generalization  has  not  a  1  v;a ys  .occ u r r ed  .      Indeed,  unless 
specific  attempts  are  made   to   facilitate  generalization,    i c 
,is   likely  that   it  will   not  occ ui:.^^tokes  &   Baer,  1977; 
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Wi  luuKui        Wi  M'Tian,    1  9  7'))  ♦ 

Ar.  iflo    Irc'Mi    lib-    i    t  r  •  r  v  <    t  i  o  1 1      forus-'d   on    '^r,).T!i^r  tr.'St 
il  <^ms,    the   toc  Hs  of   trainirvi   offort.s   })<)ve  al  f.o   been  on  more 
brocKl-bast'd     c'oqnitive  ski  In    .his  caso  virtually  any 


coul-i  used.     Th^^s.^  more  br  oacl- ba  r,od  coqniliivc  skill 

IrairMiKi   programs   are  discussed   later    in   the  chanter  (see 
po.cn:^     ).      Suffice   it   tp   say  that   the  focus  of  such  programs 
r.iiot'S    issues   v:>f   why   tests   should  be  used         a  dependori' 
VcM  ^  diltv  v/M^n   tJu"   bro/vlor   (M)r  r  i  c  ul  urn    is   the   real  (and 
coir^'^'t)    focus   of  improvement. 

r\    rinal    consideration    in   this  nrea   relates   to  the 
methodol on ica]    problem   in   studies   that   have  been  published  to 
dotr  -      .Studies    :n   this   area   are   far   fron  methodol      ica  1  i  y 
yurr^ ,   althouqh  there  are   some  well   designed    invest  iqat  rons  . 
lutare   rc  s  vu'ch  would   rvoed   to  consider   several    issues  (cf. 
Anastasi,  Namely,    inclusion  of   a  control  grouD,  random 

assiCHinmt  of   ^ubie^.rs   to  qroups,   the  comparability  of 
oretest  and   nosttest   sessions  v/ith   regard   to  maturation  to 
o^rfor^n  v/p  1  1  ,   arui   the  a   se  s  sm  en  t  of  qener  a  1  i  za  bi  1  i  ty  or 
transfer   or     .caininn   to  nontus.     c: To  r^.a  nee  . 


A  voiiety  of  motivational   and  situational   factors  have 
'')een  discussed  within   the  context  of  test  bias.  These 
include  motivational  manipulations,   text  anxiety, 
modifications   in   test  procedures,  and  various  sit    itipnal  and 


[it  K)n    th.it    could   be    i;ti  nl  niu^n  ted   with   cocnitive  skills 


Motivational  .and  .£  i  tua  1 1  onal  -Factor  s 
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proco('ura!    iss ' 

Mot  i  V.J  I  i  on. J  L  j  c  s 

Nu'tuTOus      I.  url  i        have  bcu.Mi   t^ubli.shf'd   tihnt    f  ocMi.s  on 
motivational    factors   that  may   increase   test  performance. 
Cons  irlor  abl  o  ^iivorsity  of   oroceduron,   suhi^'^^^^^r  ^nd 
incc'Mt  1       >  chnractori'/.'   those  attf^mpts    (  Kr  a  t  ochvn  1  1   ot   n\  .  , 
19H-^).     5>o:no  studios  have  rooortcd   that  when   t*  st  resnonscs 
are   reinforced,    performance    is  hiqher   than  oreviously  ,(and/o 
tho  nerforman^^o  of  control   subjects  under  j^tandard 
conditJonr>)    fe.q.,    AyUnn   f»   Kelly,    1 97  2     Herqan  ,  McManis, 
Mclchert,    1071;    Fdlund,    1072;    MurlocV,    102S).      However,  the 
results  are  not   always   in   fa^or   of    reinforcement.  For 
example,    some  researchers   (e.q.,    Benton,    193*^;  Mailer  & 
Zubin,    1932;    Tih   :   &   Kennedy,    19G4)    found  no  significant 
difference   in   performance  between  subjects  tested  under 
standard   conditions  and   those   tested   under  reinforcement 
Condi  ti    is.      Also,   CI  inqman   an"   Fowler    ^1976)    stud  ie^^  ''^e 
effects  of  candy  reinforcement  o\\   IQ  test   scores   in   first  and 
second   traders.      No  differences  were   found   aTonq  these 
conditions   (can-jy  qivti-  contingent   on  correct  -'^^^wonses, 
candy  qi'/er    n-^  nc  on  t  i  nq  en  1 1  y  ,   or   no  candy  '  iven)  on 
test-retest   administrations   to   the   Stanf ord-Binet    (Form  L-M) . 

.   Snieets  and    Strl^fel    (197S)    compared  ueaf  children's 
scores  on   the    Raven   Progressive   Ma t r i ce s    ( Raven ,    193R)  when 
tested   under    (1)    end-of~ses    aon   reinforcement,  (2) 
noncont inqent   reinforcement,    (3)    delayed   r e i n f o r cem^n t ,  and 
(A)    immediate  contingent   reinforcement.     The  autho  s  found 
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thril    whiU^   ih^^  iiH-nn   f)()r,tU05it.     ..w     '  of    5uibiec:ts   tc^sti^i  under 
i  h< '    i  (ti  [lit '  ]  i .  I  *  * '  -  r  ( "  '    t()ri'""nf'nt    r -o    )  i  f  i  f  )n   v;a  s   r>  i  q  r)  i  f  i  c^m  ii  t:  1  y 
hi(ih<M    th<in   that    of    juy  othor  ciroups,    no  siqnificant 
d  3  f  f  (.*r  onci^s  v;o  r  c  obr.ervc.vl   amonq   tfM'  mean   post  tost   scores  of 
the   thrcv^  other  qroupn. 

In    th«:.'   c"  iriqinan   ..ukI   l-'owlc^r    (M)/»,)    rtiviy,    the  nut^iors 
also  co:nf    red    I  le  effects  of   continqent   c.^ndy  reward, 
noncon  t  i  nci  ^'n  t  candy  rc^v/ard,   and   no  candy  on   the   JO  scores 
(I'PVT,    Forms   A  an*'    M)    of   children  whose    initial    scores  placed 
irwn    in    thrt^o  .^iMeront    TO   levels,      F^osul  ts   showed   that  candy 
rim  m  1      Tc)  1  .Mj        ''inq-'nt   upon   each  correct   response  incroascvl 
I  ('  St.    1  (»r.    for    Lne   i  n  i  (■  i  n  J  1  y  low-scbrinq   subjects,   but   had  no 
effL»ct  on   th-   sco/?-^:  iddle   and   hiqh-scorinq  subjects. 

f^omc?  r^.iinori^v    >  Tvm  "   studies  have  sho'wn  that 
reinforcement  ^    ,  1 >  or   candy)    did   not  affect  black 

rea's   St  a  n  f  o  rd- i  ne  t   scores    (ouay,    1071;    Tiber  d 
Y'         iy      'MC^l)    and   that   feedback   and   rewa^J   led  to 
.  i  ^rantr    'iqher   WKSC  Verbal    Scale  scores  of   lower  class 

white  cnildrei.,   but   not  of   lower   class  black  or  middle  class 
waite  children    (Sweet,    10^9).      Cohen    (1970)    found  no 
sicjnif       i.ir    interaction  of  verbal   praise  and  candy  incentives 
wilb  v;h"iii.s   and   blac:ks   on   the  WISC   Pd  ock   Desiqn  performance 
of   second   and    fifth  qraders.      Also,    Weuk,     ^.o::ynk^,  Sarbin, 
and   i-'obi  .sen    ( 1  97  1  )    found   tnat   there  were  no   siqni  f  icant 
interactions  of   race  with   (a)   maternal    reward,    (b)    verba  1 
praise  and  encouragement ,   and    (c)    no   incentive^  on  v;hite  and  ^ 
black  delinquent  male  youths  on  performance  on   four  of  the 
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nonvi'rb.il    [)(>  r  1  o  rmru    .  ^   tc.^sts   of    thn   C^,(^nt^ral    Ant.itiulo  Test. 
M.i  tt  (^r  y  • 

In   r  to   t:hfvs(>   lindiD'MJ,    Kl  iKjinan    (l^V'l)    found  thnt. 

in  7   to    I  4  y(}ar.-<)]d   cliildron   a  monoy   incentive    '.)S  compared 
to  nr<-'iso)    for   corrcH-t   rosnonses  on   the   St  a  n  f  o  rd-M  i  ne  t 
uii()r.  ov.'d   tilt'  ic*;cii»    U)  oi    hleicks  by    ^   puinl:.  ,       lilv:   no  . 
sicinif  leant  ofr(.»cts   were    found    Cor  white: 
C  o  r)  s  i  d  e  r  ci  t  i  o  n  s 

The  usual    conclusion   from   this   literature   is   that  the 

effoctf,   of    incontiv(?s,    on  minority  students,    is  neqative 

(with   t  rv-   exci'Ption  of   Klucfman,    M^7^)    (Tc^n-^^'n,    l^an),   or  that 

t        u  incentives  does   not    increase        iifren*s  scores 

i  ■ 

over   and   above   the   usual    testinn   situation  whic:h  nnay  itself 
be  somewhat  motivating    (Tattler,    19R2).      Actually,   the  issue 
of  v;helher      i    not    incent.i  'es   have    my  effect   on  tost 
per ^  rmanc^'    : not   easily  discernable  based   or^   thj*  exist  inn 
literat'ire.      ';eve^al    issues   need   to  be  addressed    in   th   s  area 
(Kratochwili    et  al  .  ,    1980K      To  beqin  w'th,   variations   amoM  j 
studies   al  .,o  make   trends  eifficnlt   to    identify,      Fo  r   ex  a;-r:  pi  e  , 
some   stedies   have   focus  r    on   t    e  effects  of  certain   ^yees  of 
reinforcers,   such  as   or  -:'s.     (!:   r.-nn  et    r:  \    ,    1      ]  ;  Hurloc 
19:  5;    F-toth        McManis,    197/'.:    Tioi  '        Kr-ineP/,    r>ri4)    and  Cc<-idy 
(Kdlund,    1^72;    Tiber   ^   Kennedy,    1        )    cri   test   r>e  r  i  o  rm  a     e  . 
\/-stly  different  condition.^   have  been   d(?velopvx1    to   r  e[:)rer,.  .*t 
t  he   "  re  i  n  f  or  ce  r  .  *' 

There  have  also  been   variations   in   nrocedure.  F"'!^ 
example,    ir  some  studies  chi  Idren  rece  ived  re  inforce^nent 
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invTitui  i  a  I  < '  1  Y  .'ftt*r   <'Vc>ry  rorrort    n-^^r  (•^.<l.,    ncr(|.in  <'t 

,i  1  .  ,    1       1  ;    !'''llun.],    l^^V.';    Wf>t.h   ^    M'      1 11  i    ,    ] '^1  ?  )  ,    wh  i  1  in 

oth-  ■    '-.tudi'T.   tlicy   rrrM'iv-:nl   rein      i    eiiu^nt    <ifl«M:    i.cry  nnbt.r'rit 

or   when   the   t*'S»:   w.i  s  compl  ot  \yllon       Kelly,    10  72; 

llurlork,  TihcT   ^    Kennedy,    ]  9r.4).      Yet    in  c  ♦  h^r 

1  ! .      ' .    ) ]  I  t  1  •  ! :  .  ,    ■        j  ^     '  '■.    V."  Mi-      :    '■■I  \  '-.'^'l    r-<  '  i  n  t  o  rr>-"^'       ■     i  ^       ^  ■  \ 

pt^rforiTv-d   l)eLi:er    (iv^nton,    193r.;    Mailer   ^    Zubin,    r)32).  Thus, 

v'cirint  lorv;    rnonn    F-.tMuMos   Miake   any  clear    tr^-nd   difficult-  to 

d(?  t  e  rnn  i  n(»  . 

Anoth'T   nici)')!'   cnM'.N^rn    is   that   stU(   i^-.e;   hav*     uot  indicat^^d 
wh«U    r  ( '  i  u  f  o  r  (  ■  mi^^  n  \    f  ^  r  oce(  i  lu  t ' :  .   co  u  s  t  i  t  u  te .  i  mi  a  I 

iii-aiVcitW'    i]       -adit  ion.      As    traditionally  c^jncL-ived,  oor:jitiv(' 
reinforcement    !=  fers   to  an    increase    in   the   frequency   of  a 
reseonrw    follovinq    the   pr    mentation   of   the   event    {KaZ(i  n, 
19hnn  .      Whether   or   not   one  can    identify  an   evcat  a'  ci 
pcjy.  Jti^,■   re'inforcc'r    is  determinec]   e:Ti  p  i  r  i  c  a  1  1  y  by  examininq 
the   rel,.  t  ion.shi  p  between   the  event   and   behavior-      The  point 
i3   tu,;t    it    IS  nue.stionable  whetlier  or   not   tlie  vast  maiority 
of   published   stu   les    in   this   area   have  adequately  tested 
rei  nf  orcooien  L   ^  i  fcM-ts   on   test  performance. 

■^Mrrent  problenns    ir   this  area   arc*  not   likely   to  be 
add  wi^h   tho   exclusive   reliance  on   Larqe  N 

;;c^tvv  uoup  m  c  t  h  od  o  1  oq  y    (  Kra  tocliwi  1 1   Sr    Severson,    1  977). 

Thus,    th'T^rc^  nnay  be   no  best   rei  ^forcer   for   a   randoTi  group  of 
children    (Parton  ^    Ross,    1965)-      Also,    Sch^ltsTand  Sherman 
(1  976)    v/er      unable'  to  draw  any  f^i  rm  c(  Mclusions  regardinq 
reinforcement  a f ter   r ev ie wing  approximately   60  studies   in  the 


I     I  lit  ()!  C'-'n^Mi*       itnr.iturr'.      :       liir     to   oth^^rs    (^'.M-,    hiii^^tt  h 

1'^^^),    thi'V  (M)n''' 1  IK  1 . Ti^it    r  < '  1  n  f  o  r  ^ r  .s   shfMihl  \u- 
1  nrl  1 V  i'l  u.i  1  1  y  ilt  ^'r:nin'.Hl    r.it'i^M'    th-in  (i^'Di -nd  i  nq   on   .i  orior 
<'i  <;j;uinpt.  i  on  .      'I'hur;,   o.irh  child   :nny  have  <i  'lifferont    r  o  i  ii  f'o  rctT 
anr]    thir-;   v,)uU]   neod    to  be  do  t:e  rnn  i  n  od  .      Si^voral  procedures 
(-oil)'!  'i;.-'!    tr,    < !  -  >  f  '  ^  r  ni  1  n '  ^    tli^;,    in-^'Mdino         i  fi  i  o  r(  •.  Mn 'mH 

hi^rc^r^-!-     .1  j>orodr'hor>    (Fornoss,    107^  r   various   so  1  f  -  r  ^'T^o  r  t 
schodul.^r;    (  Sul  zer-Aza  t'o  f  f        Mav(^r,    1^^   7),    anr^  mco.t: 
imnortantly,   an   empirical   de  te  rm'i  n  a  t  i  on  of   reinforcinq  events 
(cq.,    Hi  ion       (irinrn,    107''^;    injon  ^   P'  *  or  son,  t;    r.ovitt.  , 

1 M7S)  . 

AnotliT    issu*:^    t.[ia'_    needs    t:o   be   addt(>ssf'  :    in    this   ar.  ^ 
relates    to   v;nL>tiu.-    or    n^>i    changes    in    TO   test    scores    is  a 
rt'U-vant    focus   of  efforts*      Connct,   a^id   Weiss    (  1  974  )  ar^.ied 
tnat    it    is   unwarranted    to   assume   that   an    increase    in  correct 
rosoonses    is   neces^.^rily  naralleled   by  an    increase  i.i 
'•coqnitive   ability."      Tr       ,    if    the   effects  of  reinforcement 
in   t"St-takina        i;uations    ire   limited    to   a  mo  t  i  v    ■  i  on  a  1 
function,   and    if    all    populations   from  which   samples  are  drawn 
demonstrate^   the   s     ■     increase   in  mot  ivation,  then 

idm  ini  stra  t  .loi)   of   reinforcement   will    rhift   the  distribution 
ol    score's   unw.ird      resul.inq    in   -.-ach   sul.i'ecct's  relative: 
position   r   maininq    the   same.      However,    as   Clingman   ano  Fowier: 

(107ro    note,    if    furthor    reseamn   sub  s  t  a  n  t  i  a  t    s   th:-  notion 

that   only   select   populations  it    from  reinforcement  in 

pretests,    then   the  use  of   rein  i  orcemen  t  v^ould   not  increase 
i 

the  motivational   level   of  all   subiect^  -  ild  selectively 
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.  ;ih-\rv-^'    ih»^   per  1      in<mt.'<  ■   of    '•luldrcii    ioi    wh  k,*.'  rorrt^ct 
I  i':,[^on(i  1  ti' J    IS   1'  i  n  t  M  1  nt 'fl    by   oth^T    t[i<in   t'X  t    r  n.i  1 

i  n  (  or  <M';!M       .       M    t  h  i w^r**    t  Ik*  r\s(^,    1  ho   v-iluUty  ft    t  lio 
Lost.   woulM   l)o    imf)r()vv.Ml   by   roclucinq   :Tiot.  i  v  dt.  i  on  spcH'ifio 
offocts.      If   such  champ  s   occur,    it   would   likev/iso  be 
M-      ■  ■  -     '    y    I  I  >   '  1  •  •  M  '  r  in  i  1 1      i  :       ,  1  (  *  >(  t    "lo  1  i  v  ri  t.  i  o  n   li<\  r.   on    t  h  ■ 
prodwrtivo   utility  ol  t(^st..      While    it   can   bo  arqued  th.-^: 

tho  iiso  of   such        i  n  f  o  r  o.  MI,  on  t   rosultr.   in   oGrforTiance  that 
bettor   rof  loots  v;hat   tho   individual    actually   knows,    it  may 
a  1  SD   result    in   thc^   tost   bocominq   lf»ss   nrodi(-t:ivo  of.  -ome 
(>xrorn-il    or  3  tori  a.  would   bo    t        case    if  such 

rtvinloM   'mont    it     1  uon   or,   0(M- f  o  rm  anco  on   the  criterion   to  a 
sirnil.ir   dcx^reo.      In   osscmico,    v;hat   one  may  have    is  a   test  that 
bettc^r  measures   th.    construct    it   r^uroorts   to  measure  but  also 
a   to:.t   th<i     does  not   predict   the  criterion  as  well   as  the 
t-   t   ttir-jt   d;l   not    i'.'luce  ootimal    n»  ■  r  f  o  rm  ance  .      Tf   such  were 
the  c'u.w.-,    the  validity  of   the   test   would  be  different  for 
diffei    nt   qrouos  and  conseouently  r^iased.      Needless   to  say, 
n.ore  research   in   this  area   is  necessary   to   sort   throuqh  the 
various   t '     i    i    i  t  i  e  s  . 

Finally,    t.hc»    issue  of  v;!^.Mli'  >'  or   not  children   should  be 
reinforc    J  d. spends   so-;       'it   on   the   standardization  procedures 
^  o ;      he   tr»st    involved.      Deviation   from  standard  nrocedure 

banqes     he  ,.i.-ani:iq   c;l    scores   (Cio  'ach,    19r>n),   and  may 
actually   invalidate   test   norms   (  F^^  aue       Maslinq,  1959? 
Sattler,    1974;    Strotbo-,    1945).      Vests  mav     :  -y  cor.s  iderabl  y 
in  th     way  this   issue   is   handlea.     For  ex     '-^^        some  test 
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W'O  1    .i:>  (M)rr<M-t    I  f  .'-.[KMiSf-;  "(M'fort").      Whilf^   t  h  i  n\.\y 

in-if.rw    t         tot.il    iuii.U).'r    oi    I  r-->rv>nr.t' 1;  ,    it    iiu  y  not    -^fli^r*    t  li^ 

i>  i  tuat  i op«'j  1  -.Kactor s» 

'I'h'^ro   is   n   rathoi    Inrnr^    liMTaturc^  on  varicms 
'    MMtional    fciclotr;  to    t  v^.t    fx-r  f  o  rmnnci^  .      Mucli  c/ 

till  ,    litTTat"        111:;   bc^'»n   rev   '^v;'x]    (5500   Ana^itd^iir    l^)7r>,   Ch  .  /; 
Jen.-.-',    19R(i,    Ch .    12;    Satt]et,    1974  ,    Ch  .    6,    1  982;  Sattler 
Tiujyi^r    19f)7r    Ch  .    S).      An    important    i.^5;ue   is  whethi^r   or  not 
<iny  of    those    ;ituational    factors    intoraci       ith  at-'  cultural 
(jrou:)  -    to   prociiu;.'  * ,  i  M  lm"  enc^or*   on   iiientMl    t^vst    :  h- r  1  i  >  r 'ii .  i  nr^  • 
(joii  19Rn).      Here    it    i  f  i   aj;suiT\od    ::hat    if   the   chikl  does 

not   n^'rfor.Ti   as   well    as   oossiblo  durinn   the   t<?stinq  situation, 
an    inaccurate   reflection  of      las->rooni   performance  may  occur 
(H.':>ch)  y  ,  1^70). 

hto:,  -ar        'las   boon   dirocte.^    at   ex  pi  i  cat  inn  the 
:.Uuation..]         'tors   which  may  al     w  testinn   situation:  t.:o 
yield   a   valid   a'  :^.es.  'f   the  child's  coqnitivo  abilities. 

.Some   researc^h    )nvolving   the   use  of   familiar  examiners 
(rno'Vis,    ll'^rtziqr    I^ryman,   &    F-Y^rnandez.    1071),  po.-^itive 
ore^test    int*  raction:;   betwt  •  n   examiner   and   child  (.Tacobson, 
li.  roen,    FVjr-    :nn,    MilhaM5;,   ^    ^rerson,    1971),     ad  tostin 
location    (Seitz,    Abel  son  ,    bovine,        Zinler,    197"^)  has 
soq  Jested   thic   a   situati     la  1 /r  o  t  i  v  a  t  i  on  1 1    cxolanation   for  th^ 
poor   per  f  o  rma  nee  of  econoT.  i  ca  1  1  y  d  i  sadvan  t  aqed  m  i  nor  i  ty 
children   is  a  definite  no  :  s  i  l>  i  J  i  t  y  ^     Also,   some  research  has 
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^  K  Mil.  M)'.  t  1  .» t      1    I  : ;.  M    <  M    ,  M  1 V  1  n  t  .1' ,     i    I  li  I  i  (1  r      \     i  i  '  '  i  k  '  >  .  i  1 1  n  »  d 

.  •    1  .  ,    I ,  I  1 )( )V'  ,     '  ^>  / 11  ;       I  <i  1  •  "  1  ,    Al )( '  1  !.i  M  I  ,    f,     ,  H    I  (  /  ,     1  '17  M    •  »M'  (    t  Ik  • 
♦  -        1     1   IP  ;  !  1  < '  u    ( » '  . '  I  .  ^  t  r  ( ■  i  ^;  s<  I  n  t  h  ,    V  -n      « -n.  m  ,    I  .i  v  •  k  i  , 

Mr.hiy,    K  >:  I  ,1 1  orhwi  1  1  ,    1^)77)    rno^jols  t^x  rx)  5-;«.  d    to   ,m  ^^x<nninoo 
prior     t  ( ^   . »    t  •  ^: .  t    .i'  ini  n  M    t  T    t  1  on    f  1. 1  V(  •    \  M  .i  I  •    1    t  »<  •  i  i  w  t     .  i  u  . 

»vr'-pl.',    IM  .  r>t    ril.     (l')77)     foiiivl    t!i.\t  t)f(^tor>t: 

vic.ir  w-t!       .it.Uritir)n    m   which  :ninority  qroup  children  watchod 
i\   .s.      ;  1- m  i  rin  t  ('   '     lr)t.r.>o  of   <\   wli  i  t  f   •'Xd'iiin.  »    to^dinq  n 
nunMrity   fhil  '    under    pf.)r;  i  t.  i  v»  ^   ccniri  i  l  1  on  5;    (^^.rj.,  praire^ 
r,    .vjlt»"i    \n    only    1    .  s  V,   of    th(^   Wir^'   ?'       -omv.   b^'ino    ISO  holow 
th-'  rrir-nn,     -.'ire''  is  ind    S  ? .  4 of    1  ho   S(-v)rof;  rv 

below   th-'  nio^n   urpior    st^ndarci   nnd    feed!  coruli  ,on';, 

res  no   t  i  v(»  1  y  . 

SoTie    i  n  ve:  o  1  q  ci  t  i  f>  ns   h.ive   f;hown    that    low   ^M\S  :-;osch'>ol 
childrtMi       dnr-k   -ml    wh  1  '  e)    obtain    hinher    scorr^r   on  Vhe 
.St  an  :'ord~FU  net    (  I-'f^)  rm   L-^^d    wh'  n   a    test  administration 
procodn-'-^   allows  a  m  co-'  im  ut  r-nbor         successes  early   in  the 
testino   •  ■ r;o  r  i  f -nee   than   undoL    standard   ad^i  1  n  i  st  r  a  t  i  ^?  n 
proctoares    (Ziqler   ^    Hutterliold,    V-SP;.      F^er  f  o  rm  a  nco  was 
ootimi7.^rl    rw    sor'h    !:)r  or  ed  11  r    s    as    orese^ntinq    ea^^]f->t    it  as 
f        *    a  n  i   q  i  "  i  \y]      a  s  i  ^ » r    i  t  e^  s    f  r  om   a  n   e  a  r  1  i  e  r    a '  ]  e    1  e  v  ■ 

th..-   :-^'nld    failed    two    siu^     'Ssi'*^-    iteoi'         .-^  1  i    a:  ] 
r.-(,    ^  (1971)    f',:und   that    randonn    inn    the  difl  i  -ulty   l-.  vel 

of   tac    ite-s   on   tlv^   Peabody   Picture   Vocabulary  To^o    (  1  F>\/T  )  , 
Qlor;j   with  other   procedural   changes,    ■  ^«')         hiqher  scores 
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t  ;  ■  i    ■     .  .      ■  .  t  .  M  !   1  !       /  M  ■  I  '    '  ■        :  '  ,     .1     ■  i  I  <  ^  }      1  I  I  ,  It  ■  1',      p  r  .  ■         M'  • 

,    U  »  1  >  I  I  ♦  "M  .         i     U  .  M   t  I  j;     I  t  ♦  ■  I   ■  ♦  *  ;  .  (  U<  1  1  "  >  ■      '  i'       M'  •  t  1  IK  ■  1  (  1   '  ♦  (  •     t  [p  ' 

I  , , ,  ,      !  •  1 «  ■  ■  I  .  I'  1  f     ' .  I  I  I  i    ^  •        <  '    ^      I  (  X  ■  1    1  I  i  r  -      ,    : ;  1  f    M  ' 

t  h<  ■    t  » •  M  1 1  t  • .    h  I  ■ '  r    nn  t     1  )■  '  (  -  n    .  1  n  1  1  \'     M  1       •  i  ^  i  ?  <  i  I  » -  I  \'    1  >y    t  <  i'  'f'  .  As 
,K.ii..<n  ru^t.'!'.,     .im  h   (i.'T>,jrtut         from   s    1     1  a  r ''1 

» '  1 T.  I  n  i  • .  (  t  .  >  t  i  (  u ) ' :    only    ■ . '  i  r > w    I  h .  i  f     t  h  •    ( •  h  m m  i  n  1    n r  o i ' •  ^ ^1  u r    • :     k  ,in 
[i.,v<'  .  ii.'''t    on    tl).'   .iv<'f,»M«'  ''i     .  •    Off,.       In  i 

s  t  11'  i  V   V  ')  i  ( ■  i  1    V <  1 T  1  o(  j    t  h '  ^    t  ( • *  I  no    .  i  t  ni  <  >  , m^.  .  i         ,    r    1  u  X 

Mo;iM--lik»'   vct.siir,    f(M'M,n,    ^ -v  a  I  u  a  t:  i  v  t» )  i  i  n  i    t  r  a  t:  i  or'i  of 

tlh-   V.'lf'  '   to  :M^n  v;hit^'    \U(]   7  (U?   M  •  -  ■  ^-  /iri'l     '>nior  hiqh 

-ncx  -  1     •  ■  '  1  i.  ■)  t  s     (  S-iMii). '  1  ,     1  '''/V  )  ,    iv'>       '  ^  . . :  t       ;P.M  n    r»  f  f  ^ 

0  1     t  })•      ■  ■      ■  I  iio    ,    o  )f  1  1  t  n  ) :  !• .    .  i :  t  ;    n^       i  >  ,  1 1 1  i     i  -  '  '  :  i  u  '  t  !  >  M '  • 
v;ith    t  oo.'  (^f         ,    1'^^,    :w^x    ()t     f\  ,    or  pt  otat  ion  y>  '  10 
Icwcl    wan    fourKl.      Yo  t'    ro.sfsi'  'h    in   this   area   clo{^s   point    to  tho 
t.iot    that    5;t  anclct  rd  1  7t.Hi    ter^t   prar-tiros   do   r.o^     i  n-l  non  optimal 
t^.olor-M  Kiot',    a    far-t    thot    n.-t',!:.    to   l^-   inoi'-'    Tully  '^xa-nin-v']  for 

1  I.  5»    :  ^  ■      [U:  i  J  ]  1  y    h  i  a  r.  i  na      f  ^  ■     ^  • 
Tt'/  ■    A  r^x  1  u  ly 

/'   (.' cv^  f;  1  ri ; '  r  ah  1      tu.ay  1  i  t  ■  ■  i  a  1 1:  r      aai;   d^^volopod    in  the 

aroa   of    to5;t   ar^xioty,    ^^ot    r(?la*-ivo]y    fow   studios  have 
i  •  I  .   ■  I  \o  ^    ao  ■  a  a  t  i  o  .  va  v:M       r    !         a  n  x  ^     ■  V   oo     ■  1 1'  (K'-  1- 

-.•V,  '.It-.ur-:^    Of    -aoiai    qrou    .    d  i  t  f  cm  onor.'s    in  to?:t 

o .       a  r     !  I '  a .      '-'a  n  o  r    :  * ^ '/  i  ^ '  w  '    o  t    t  h  o    *  ^  v o     mvx  i  r ^  t  y    1  i  t  o  r    t  u  r  o 

.,n:)o,ir..'c]    (r-.o,,    Aaastcisi,    1  O'K)  ;    Mc- ;  r  i  s    ^    Kr  a  1  or-h  v  ■  ,  ^  . 
I  M^r;  ;    ].    ^.    ,'>-ira.son,  .S .    H.    ^iara-.oii,  F)aviason, 

r.iarilball,  Waits^.  ^  I.aebush,  10r.0;  Sattlor,  197^»,  198?)  and 
Jonf.on   (1^)80)    has  reviewed   work   in   the  area  of   test  anxioty 
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^\n(]   h  i  ar  . 

I'.'riis'U'rh   OM    test    ^inxipty   >inri   hiri<;    is  q('rv^rrilly 
inconcl      ivr^  and   has  examined   a  very  narrow  version  of:  the 
anxiety  construct.      Solkoff    (1972)    administered   the  Sarason 
Tost  Anx  i»»ty-Scale   to  black  and   white  children  between  the 
aqe.s  of  8   and   11.      Results  showf?d   no   sinnificant  race 
difference,   no  siqnificant    interaction   with   S's   race  X  F 
race,    and  no   siqnificant  correlations  with  WISC   Full  Scale 
10.      Jensen    (19B0)    reviews  two   studies    in   this  area  by  he  and 
his  associates.      Tn    the   first    (Jensen,    197'^e)  ,  a 
questionnaire  measure  of  manifest   anxiety    [The  M ( neurot ic i sm) 
scale  of   the  Junior    Eysenk   Pc^rsonality   Inventory]    v;as  qiven 
to   samples  of  white,   black,   and   Mex i ca n- Amer ican  children  in 
grades   4   to     .      lie   found   a   siqnificant    (but   small)  qroup 
difference  on   this  measure,   with   the  whites  obtaining  higher 
anxiety  scores.      Also,   there  were  no   siqnificant  correlations 
with  verbal   and   nonverbal    TO  and   scholastic  achievement 
tests.      In  a   later   study,   Jensen  and   Figueroa   (1975)  examined 
the   interaction  be'  ween   race  and    immediate  versus  dela-yed 
recall    of  aural  digit   series   (digit   soan   is  purported   to  be 
sensitive   to  measurement  of  anxiety).      Hov;ever  ,    in  a  large 
sample  of  white  and   black  school   children   in  grades   ?  to  8  no 
siqnificant    interaction   v;as   found    in  digit   span  scores. 
Similar    to   these   results.   Noble    (1969)    found   no  differences 
in  pulse  rates  of  black  and   wnite  elementary  school  children 
immediately  before  and  after  being   individually  tested. 
Considerations 
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'rhe  conrlusion  of   tovioworr>  of    this   1  i  tc^rc'it  ur(^   is  that 
tlv.M<^  <jr)oonrs   to  ho   no   "cons  i  st  on  t"    or   "  a  m^r  ^ i  n  h  ]  r^" 
di  ff  orc^nticil    offoct  of    anxiety  on   tno   tost  porfornianco  of 
whites  and   blacks    (  e  •q  .  ,  Je  n  son  ,    lO^^O).      From  the  available 
literature,    this  must  be   the  conclusion.      However,    we  should 
point   to   several    proble-nacic   issues    in   this  area.      To  boqin 
with,    there  just   isn't  cnouqh  empirical   work  to  draw  any  firn 
conclusions  because   "anxiety"   asses sm en t   has  been  confined  to 
a   rather   narrow  ranqe  of  measures.      For  example,    Noble  (1^69) 
ineasure'd   pr.  1  sr-    rate,   but    t:his   is   only  one   of  several 
physioloq ica]   mea:         ;    that  could   be   employed.    Likewise  in 
the  Jf'Msen    (1973e)    and  Jensen  and   Fiqueroa    (  1  975)  studies 
only  a   limited  measure  of   the  so-called  anxiety  construct  was 
emoloyed.     Thus,    i nvest iq a t i on   in   this  area   suffers  from  a 
construct  validity  problem   in  as  much  as   it   is  not  at  all 
clear  wlietri'^r   or   not  anxiety  \^as  even  assessed. 

In  order   for   a   reasonable  assesfmnent  of  anxiety  to 
o.  cur  ,  Pleasures   should  be  taken   on  ccqn  i  t  i  ve ,  behavioral,  and 
physi oloq ical  dimensions.      Assessing   these   three  response 
modes   orovider;  a  more  adequate   test  of  whether  or   not  anxiety 
occurs  on  more   than  one  measure   (Morris   &   Kratochwill,  19B3). 
Also,    it   is   important   to  assess  each  of   these  measures 
throuqh  a  device  or   procedure   that  measures  some  aspect  of 
the  'three  construct  dimensions.      That  is,    for  example, 
x^hysioloq ical   arousal   can  be  measured   through  either 
behaviors,   se  1  f- repo  r  t  ,  or  ohysiolog  iCal   equipment   (  e.g..  , GSR  , 
heart  rate,  blood  pressure).     Until    investigations   in  this 


191 


191 


aroci  take  into  acrcount  aJvaiK-er^  in  anxit.-l.y  .J.st^r; -.;Tu»nt.  ,  litth* 
liqht    i  r,    lik^.'ly   t;o   ho   shod   on   tho   rolt»  of"    tost    ar^xiotv  in 

s s » •  s  Sfr\ont   b  i  c  1  s  . 
Other  V  rj  r  1  a  b  1  c  j; 

Achievement   Motivation.      Achievom     t  motivation    (N-Ach)  has 
bo^'H    id^-nt  1  f  ]  •  .ri    ,)'-        possibl*^..^    sou  ot    ass»'>smont  bi<o.;; 

*  becauso    it    is  noto'     th  .t   various   cultural    qronps  may  cHffer 
in    thoir   J  evol   of   n-Ach   (Chapman  &    Hill,    197]).      The  -n-Aoh 
construct   is  said   to   influence   tc^st   oerformance   (a)  by 
dotornnininq    the   level   of  motivation   (e.q.,    interest,  effect, 
etc*)    (lurinq   clov o  1  orxn on t   and   prior   to   takinq   various  nnenta] 
tests,   and    { b)    by   influencinq  motivation  durinq    the  actual 
tost   (Jensen  ,    ] 9Rn)  . 

Jensen    (lOPO)    noted   that  conclusions  on  the   role  of 
"achiovoment  motivation  as  a   factor   in  systematic  groon 
biases   in   testinq   are  virtually   impossible   in   terms  of  the 
empirical   evidence^"    (p.    616).      There  are  several   reasons  that 
have  been   identified.     First   of  ^11,   there   is,   as   in   the  case 
of   anxiety,   problems   in  how  the /construct   has  been  defined 
and  measured.     Second,  many  of   the  measures   that  have^ been 
used   to  measure   this  construct   (e.g.,   f:>ro  j  er^  t  i  ve  tests) 
suffer   from   reliability  and  validity  problems.   Third,  many 
investigations  apparently  do  not   show  a   strong  correlation 
betv;ot'n   n~Ach  measures  and   intelligence   tests  (Heckhausen, 
1967).      This   latter   finding   has   led  Jensen    (l^^RO)  to 
speculate   that  hiqh  n-Ach   is  more  a  p.^Juct  of  high  ability 
than  the  reverse.     Thus,   no  evidence  caq^  be  advanced  that 
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AnncGr;nncnt;   Hi  ,m 

supiJi'r*:'    t  ^ic    roh'   nf    ii--Ach    in    ,  ^ f;  v  s  s:^^*  mi  I    i'l  )S  , 

St.- 1  1        t  uu'P.i  •  1  f  -  ( 'f;  t  ♦ 'I is   (inntli'T   ^ 'i>r) :  >  t  r  ik  '  t    th.it  fujy 

<^v)iiti  il)Ut<'    t<)   ,t r/r.' Mit    l)i,if;    (Jt^nscn,     '  ^HM^  ,  nut 
nil  f  or  t  un  \  tt^  1  y  ,    no   onipirir-al    work  tinr;  (^y  am  i  n<:'cl  this 
[)os5i  i  hi  1  i  t.y  .      If   one*  Ocir  t.  ic  nl  .ir    racial    or  minority  qroMO 
W(  •  r  .     I  ;     h .  I  \'  <  ■   H  1  f  t  •  "  r  ^  n  t    1  '  ^  v  ( ^  1  : ;   r )  t    t"  h  i  ' ( '  o  n : ;  t  r  i  J(  1     Mi .  i  n    " i 
majority  or   whiti^   r)op  u  1    t  i  on  ,    a   case    for|  asscssmcMit  bias 
cr)u]fl   r>o.ssihly  ho   hnilt.      Jensen  roviewf^^i  procedures 

for  testinq  this  hypothesis  and  the  reader  should  consult  the 
r^^viev;   for'  specific   recommc^ndat  i  ons  . 

Mail  ec  t  i  on-- 1  moul  s  i  vi  ty  ,      Cons  ider  ah  ]  e   emoi  r  ica  ]    wor  k  has 
boen  cont]unted    in    this  at     i    (see  Messer,    1976'  for   a   review)  . 
Ibe  assumptic      in   work   in   this  area    i^s   that   some  individuals 
are  reflective^   in   response   style  on  certain  tests. 
Charac^ter  ist  ica^l  y  they  would   ponder   alternatives  before 
resnoruJinq.      In   coK)trast,    impulsive   individual's  are  quick  to 
respond   and  may   fail    to  weight   all    the  alternatives.  Kaqan's 
Matching   Familiar    Figures   Test    (MFFT)    is  a   common  measure  of 
"his  construct.      Unfortunately,    this   is  aqain  an  area  where 
virtually  nothing    is  known    about   its   influence   in  assessment 
bias    (Jensen,    1980).      Jensen    (1990)    does  speculate  that 
reflectivity   is   highly  related   to  ^. 
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k.jcij   oL    r.  X      X  nut. 

1 

'I'hc   Tisro  of    the  exriminor   has  often   been   oronosecl   as  one 
;Ti.)i,>T    '.MUM''^  of   hi.-j*^,    IP    thc^      s  s  o   r.TTi  o  fj  t"   procfv^s.      Indeed,  one 
piHiKiry  Mr  oc(>(lur  e   that    h\y>   b(^en   suqqc»rUec]    in    the   noirit  of 
mcM-tiiin   nondiscriminatory  assessment  criteria    is, to  use  a 
minority  (iroun  examiner   to  assess  the  minority  child 
(Kra  tochwi  1  J  ,   .>t  al.,19ttn).      The   tactic   is  not   to   just   use  a 
minority  oxar.in^r,   but   rather  one   that  matches  nr  closely 
aoorox  im<j  to5;   the  minority   status  of    the   child   beinq  assessed. 

Thi*  examiner's   race  has   been  hypothesized   to  be  an 
important   factor    in. affecting   the  minority  child's  test 
nor  f  or^^vance  through   (1)    the  possibility  that   the  child's 
perception  of   the  testing   situation   leads  to  inappropriate 
behaviors  v;hich  are  judged  by  the  testor   to   reflect  low 
ability,   and^  (2)    the  possibility  that  -final   scores  are  biased 
by  the  examiner's  ex pe,: tone ies   for  performance  of  minority 
children  resulting   from  pretest  referral    i rff o rma t ix)n  and 
unfamilarity  with  the  exarr.iner's  cultural   backg  round  jsnd 
dialect   (Meyers  et  al  .  ,    1  974  ,  P.    22).      In  practice,  this 
concern  has'been   translated   into  some  specific  actions.  For 
example,    Garcia    (1972)    noted:     "Be  skeptical  about 
utilization  of   standard  d i ag nost ic   instruments  when  used  to 
identify  the  learning  behaviors  and  capabilities  of 
bilinquals.     Instead,  utilize  bilingual  clinicians  to  assist 
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in    tljc     (l.'Dt  I*  i    .It  ii>\\    TM'octv;.';"    (n.     O  .      Most    r  ♦ '(^r)rni^i^' ri<  1 .1 1  lorr. 
of    thi:.   j-.ort    r»»l(if(^   to    irulivifliMl    01   on* t  o- ono    f.onns  of 

\\v\    in'    .it     l'N'u;t     uivolv'    r.onu     forrti  of    inttTt  ir.oD/jl 
r  ( '  1  M  t  1  ( )  1 1 : ;  I )    n  . 

Over    l.ln'   yo.irs   nu'TK^rous   duthor.s   hrivc^  suqqi^  cod  thrit 
r.ic  i«)l    <i  i  f  f      encrvs  inay  affect    the  ex  am  i  ne  r-ex  arn  i  nee 
I  t '  1  a  t  1  oustn  f  ^   (('.q.,    An.eU.Ksi,    1^)*")^^;    Anast.Ksi        I-'oley,    PMM;  ■ 
Carth,  Helnard,    19S7;    Klineberq,    103^),    1944;    F^ettj  ji 

I9h4;    Pressy  8.    Teter,    HlO;    Riesr.man,    196?;    Stronq,  1913), 
Sornc"   authors  have   noted   that   ethnic  differences  can  create  au 

'*  a  tm  c>r;nhr«  T  p   hi/ir;"    and    this   should   be  consider^^d   a   part   of  tlie 

I 

doniair^   oi    l-v^a    bias    (♦'.(!,,    Flanqher,    197R}.      Klauqher  (107H) 
noted   that   t  iu>  very  art   of   testing]    itself  may  be  unfair   to  - 
certain  minority   individuals  because   the  situation  itself 
'inhibits  usual   or   tynical   per f or amnce .      It    \s  certainly  - 
possible   that   any  bias   in  assessment   could   be   reduced    if'  the 
examiner   possessed   ar  lanquaqe,  vaLue   system,  cultural 
information,    and  a   familiarity  with   learning  strategies 
similar    to   those  of   the  client. 

The  conclusion  that   the  race  of  examiner   is  a  potent 
factor    in   test  bias   is  not  at  all   clear.      There  are  both 
methodo"!  oq  ica  1   and  conceotual    factors   that   have  a  bearing  on 
any  conclusions    in   this^^area.      One  of   the  most  careful 
examinations  of  empirical   research   in   this  area    is  presented 
by   Jensen    (1980).      Fi^om  research  conducted  between   1930  and 
1977,    Jensen  Classified  studies   into   three  main  categories  of 
exper imental  des iqn :    (I)    inadequate  designs,    (2)    adequate  but 
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Ji-nstTi  'l'fjih'<]    ,11)  (i'.'siMn    \\\    tin-    f  o  I  i       ?  n-i 

w.  1  V  • 

W)  i ), '  .in  ,  I'  i  I  'Mr, ,  1 1  ♦ '  ^  i*  •  s  i  m  n  ^  j  n  (. 'X  f  ><  •  r  i  :ti  •  [  1 1  on  tlx-  t  • !  1  ♦  ^  •  I  m  ! 
t  lu»   r.irc  of   (.  Xciminc'r  ^  ri   LrT>t    sc^or»>5;   .should  mc^t    t  ho 

follov/inq    tv.H^   iTunimurn    rocu.  i  r  crTien  t  s  :      (1)    at    lo.ist  two 
(or    i',(M»')     i.s    ol      '.nil    t.ric*-^    ahi     ( )     t.iiidt^.i.  i  M  ''^ 

r.nhit'c't       (''fO    1.0    F-^s  .      Those   roqu  i  r  (Miien  t^;        •   ^  obvious. 
It    thtT^.'   is  oT^ly  one    K  of   each   race,    the    'ar     ^If  of 
^  <i:-e    is   v;holly  (Tonfoundod   with   the  other  personal 
cittrihnt''s  of   c^ach    F-: .      F^a  nd  omi  za  t  i.on    is   net  Vnl    to  ruN^ 
out    th<'   rx)  r>s  i  b  j  1  i  1  y  of   any  sei.^Mion  bii:.  nnqht 
rr->;ult    in  a   spurious    (i.e.,    nor-causal)    con  ition 
iH.'tween    bs   and   the   part  beinn   ni.  '*'ured.      Any  study  that 
does  not  meet   these  Tiinimal   requirements  of  ox  per  imen  tci  1 
dc'siqn   is  classified   as   inadequute.     When   it    is  not 
clear   whether    the   study  meets   these   requirements,    I  have 
qiven   It   the  benefit  of  the  doubt  and  classifie^i   ;t  \^ 
adequate   (pp.    S96-597)  . 

Jensen    (1980)    also  determined    that  an   adequate  desiqn  is 

incomolete   in   the  case  where  subjects  are  sampled   from  only 

  <,  j.-- 

one   racial   qrovip  ,and  complete  ^;hen   S^s  are  sampled   f   am   tw<:  or 
more   racial   qroups.      A  qood   desi        (nonrepeated  -measure..)  iS 
oresented    in   7\3ble  S.l.    In   this  desiqn    it   is   tao   interact  on 

-  \ 
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Ad  (M/\i,i  t       ,ni<l    ( 'Ofii  f)l  <  M  •  •    IK.-SMHI    for    As  s  ( ' ,  siiu  m  .  t 
()\     lAu-c   of     i:x  .im  \  i  M  'f    Ff  t    (M  s 


V(un'  of    f^x  .^ai  1      r  s 
Ma  i  o  r  i  t:  y  Minority 


Ma  j  o I  i  I  y 

l^ir^-  of 

Minority 


Source:  Aclaoted  from  Jensen,  A.  Ri  nias-in-roental  .testing 
r'ork:      Froo   Pror,s  ,  19Rn. 


A\".f'VMitf'ril      1 .1 


c)l       *  s    r.i»«'   .ind    i:'s    r.M«'    tint     < '       ticil     to    \<-stin'|  th(' 

IUj?^t>ri   on   thrvso   or  q  arw t  i  on- «  1    formats,   Jori?ion    (lORC-l)  has 

) 

cont  1  us  i  ons  f  t  om  .  i  n  v*' s  t  iq  a  1 1  ods  within  those  ^ir(*as.  In  tho 
ru  CNJ  of  i  n.Kjt'Oii  •  t  ( '  rl'-^siqns,  virtU'^lly  no  conclusions  can  he 
mar]o,  althouqh  ahout  half  support  the  hypothesis  that  black 
S^s  P'-rform  hr^tior  whrn)  trvstcH  hy  hlack  P.  than  when  tostf*rl 
l>y    1   wt)  1  t      K  . 

An   aMc^qua  t  o ,   but    incompletef  rU^siqn   format    indicates  no 
significant  effect  of   race  of  the. E  or   S  •  s   test  performance 
and  one  sliov/e*  a   significant  effect  of  race  of  In   a  more 

recent   stu^ly   that   fits   into   this  conceptual  i'zat  ion  Terrell-, 
Terrell,   and   Taylor    (19R9)    investiqated   the  effects  of  race 
of   examiner  and   type  of   reinforcement  o*a   the  intelliqence 
test  performance  on   lower-class  bl^afck  children.     The  authors 
found   that  children  given   tangible   rewards,   regardless  of 
race  of  examiner,  obtained   significant! y.hig her   scores  than 
did   children  given   no   reinforcement  or   children  given 
traditional    social   reinforcement.     Moreover,   the  children 
given  culturally  relevant  social    reinforcement  by  a  black 
examiner  obtained   significantly  higher   WISC-R  test  scores 
than  did  children  given  culturally  relevant  reinforcement  by 
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s';,rn.n^  ick   ^  (1  '^(^^^  ) 

Knohlock  Pel  OS  i    (10r,H)  Abrcimson  (inr.O) 

I''o  r  r  f-r. t 

Kl  .111':    {}'^■>^)  r'<>r;tr»Mo    fpr/M)  rroO^n  (1070) 

\.  i'  r  (  1  '•'-■1  )  ;-••!!     {!''/')  f^VM'T     (  1  ) 

P.'ttiMt.w  Morc^    ^    KVMiish    {in/zi)  Could    V    K  1  (M  ti    (  1  '»/ 1  ) 

I,irv.it7  Vt^roff,         (M  (.» 1  1  cirvi  , 

1^  Mo  r  ou  i  r,    (10?  Ui  , 

CalrlwcO]        Kniqht:  Yando,  Ziqlor, 

p  07 )  Ha  tor>    (  ]  971  ) 

Sc-ott,    Mart  son,  S^lkoff  (197?) 

Ctmn  1  nq  han    (M^/r,)  Savanc   ^  (1972) 

Franci-    (1  97  n 
W'  '  1  1  boT  n  ,    Ro  1  '1  ,  ^ 

K(Mch<\r(l  .(  1  97  M 
Jensen    (  1  91  Ac) 
Mar  v;  it        ^J^>l]7Uinn  (197') 
^   So  J  kof  f    (1  97^1  ) 

Ratusnik   &   Ftoen  in  sknecht 

(1  977) 
Samuel  (1977) 


Sourc-e:  Adapted  from  Je^nsen,  A.R.  in  as -  Id  rrentjl.to^tinq,  Nc^v/  York 
Frt^-e   Press,    19R0.  ' 
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,    f  1  .   .    I       1  .  .  !       I  .,1     .       .  .  !      .    ^    r "  1  1  i  .    •  •  '       !  '      i  I    '  . 

l-  l  n.t  M  V  r     »  "     «       '1'''^"  »^        '    '      H  t<  M       ,  t  .  •     .  n'  I       '      I     •   t       J.  •  •   1   (M  .  , 

I,  .  ;  ,    ,  .    M      M  '  '   '  '  ^      I  !^  ^  t  .  M  ,   ,  t       )  I'     (  ,  I  M      (  .  I       I  '  >     -J  I   -        'I        «  •  1   f  <   <    t      .    '      !  h  ■ 

.  t  I    i;    y    I      r  •   ( »  (     .  .      : :  ^  u.  i  I  .  •  ■ .    t  1 1. .  I    <  1  <  'mt  •  n  • .  I  t  .»  «  •  •  .i 
' ,  I .  I  ( ,  1  (  I  .M  !  I  t     .  •  {  I  <  M  '  t     1 1 .  r  . '    I )  ■    I  1  1  <  1  M  '  I   •    « '  f  f  <  •  ^  M  ' .    o  f     \  i>\  I. 

{ ►  V,  •  r  .1  I  I    ID    .  I  :i    '  1  1  t  1  (  M  ♦  Ml'    '    I •  t  V.'.  •  •  -  n    t  .i '  •  i  .  i  I    <  ]  r  ^  in      ,     »    f  i  f  i  - )  i  u  - 1  in 
t  i  K    M  ■    s  t  u«  i  1  <*s  . 

C!unL;  1  ti r  .J  t  i  on:* 

I^ar.iMl    u[;()ri  work    in    t.his   .irc.t,    .I«mis'Ti    {  lO'UM    ( >  ik  •  1  n«  1 1  ■(  1 
t},.,t     tti-K'    \  >    no    siir^ooi  t     (or    t  fi-   .) -mfh  o  t  i  o  ir .    th»t     th<'    i,io--  n! 
, .  V  ,  1.11  i  :  1,  >  I     1        in     I'M  ou  r  (  ,  I II  t    *;m  ni  ( *      1     \' .  o  i  .         •         t  v;.  ■  ♦  -  j         i  t  .  ■ 
aivl    bhi^^ks   on  m.-rair.  s   ;Ti.-nt.ol    .ihility    (p.    f>('/-r>M^).  v;1m1.' 
SK  -h  n  c-onclusion   seems   oosFiblf-  r-\t    this   .U.  ^qo  of   knowl  (^dr^..  , 
sevoial    conrt'Otual    issur-;  must   be   raisod    in   this  l1torntnr(^. 
To    boom  with,    t.horo   aro  nrobloins   with   attf^mpts    to  comfiaro 
dilforont    sturlicr;   usinq   a    "box    :;<-or(^"   anoroarh    {Ki7clin  & 
Wili.on,    ]^^7«).      Tt    is   not   oossiblo   to   review  all  the 
difficulties  with   the  box   score  approch  hf^re   and  th" 
interested   reader   should  consult   Kazdin  and  VJi  1  son   {197R)  lor 
an  excellent  discussion  of   these  problcHns   in  the 
psychot  her  anv   litc^rature.     However,   some  points  minht  be 
raised.      f Ac tua 1 1 y  ,many  of   the  criticisms  of   the   box  score 
anproacr.  also  aool  y  to   the  meta   analyses   (cf.    Smith   f.  Glass, 
1  977)    alternatives   that   are   propo^ecJ   for   literature  reviews 
as  well.]     ^F;rst,   a  scries  of  stsudies   that  are 
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.1  r  f  >    '!,  1.  1.      1  M  •  t  ..  ,     n    "  I  -  >  1  "     mi    "  ! .  i  I  "       t  i  >  t  i  ■      .       1    *•  •  m    v  i  t  Ij  1        I  i       :  - 

•  •  t  r  I   . ;  '  1  ■    ■  ,     '   '  1  i  r   I    ■  [ »  t  '  r  ' .  ,       !  i  i  1  -  i  t  *  1 1  ,      i  ;  i  1      i  ■  t      i  ,  i .  r  (     ■  1 1 .    i  .  i  m    ■  ■ . 

(  1  .  -    .  ,       <    ■■     i]  '  ' .     ;  f    '  *  M : :  i  ♦  :  ■  '     . ; !  •  1  ]  i  '  ■/  '     r  .  f  1  I  ■  »     .  i  n  \'    '  i  r 
'  • '  >  n '  ■  1  M  ' .  1   " )  • .    >  '     •  [  I  '  M  :  1  t  . 

"     t  S  M  >  '     r     )  :  ,\     :  > '  M  •  .  !  .    ■  *  '\     t  ' ; .  ■    !  .  ,  1  •  ' :  '  r    >  i    h     i  ■ , 

1  '  .  !  t  ,  f   1  '  1  '  Mil'!  M  .  ■     ■   I    1 '      '  ■  ;   •  t      ♦  ( 1  •       .  1  '  '  '  . 

':•.!!.,  r  <  ■  '  '  {  i  ■  ■  I  •  ■  r  "  I !  1  '  "  r  i  ■■  •  '  <  ■  * ,  i  !  !  "  .  i  r  «  •  r  >.  >  y  •  .  j  -  ,  »  ,  ■ . 
ho'(.  o'l '  •  n  ■ I  j:.  ,       Mut  ,    rr,i*K)rity   Mr^>ur>   ("lulfirtTi    r  <  •[  >r  ♦      'r)  t  'i 

'  ■  t  ♦  *  r  ( )'  I 'H  •  ■(  M  i" .    I K  )  I  )u  i  ,  J  t  1  (>  ri  ,    so    it    [xm  -chti  » f  1  i  1  f  i  cmi  1  t    to    k  n  o  v, 
v;h,>t    <n'>nf)   oi    «n'f''^);^:»  childr-'n    ^i^v)ulH    bo    so   c  1  <  i  ?;    i  f  i  or] 

(Sittl.'i,  Vnr    ox.rTir>lf',    V.ilont.in**    (  1  ^v/ ]  )    fourvi  1^ 

(liffor-nt:    A  f  r    -  A'tio  r  i  f -in   s  uV)q  rn  u  r)5i    in   nm^   iirhrin  c(MTnnunity  -ind 
C'-ich  })  i']  tnof*  or    lefis  di.stinct  cultures.      S.ittlor    (107'1)  also 
notoc]   th(»t    the   label    "minority  qroup  childrt^n"   is  typically 

f 

used   to"  c]'»5>ic|nato    individuals  whose   values,   custorns,  patterns 
of    thought,    l.inqu.iqe,    and    inter^^sts  .ire  different    from  the 
cl<^^'ui,i^nt    culturr*    in   whic^h   Vhoy   live    (Liddlo,    IM^)7).  Incliidi»d 
v;ithin   such   a   conceptualization   v.-ould  be  qrouns  includina 
blacks,    ^^.^x  leans,    Indians.-    Puerto    Ric^aas,    and  ^s:rious 
subculture  white >   i Appa 1 ach ians ,    foroiqn  born,  unskilled 
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laborers,   and   so  focth) .     The  point   is   that  observed  effects 

[■^leyors  et   al.    (1^74)    noted   th.it   *'race  of  exaT.iaer"  uiay 
:  ot    in   and  of   itself   lead   to  deviant   responses  by  a  minority 
ch.iid.     Neqative  resoonses  nnay  be  elicited  when  certain  modes 
of    interaction  are   initiated   (e.g.,  outriqht  expression  of 
disappointment) .     They  note   that   personal   examininq   style  and 
the  milieu  created  by  a  white  examiner  may  be  more  related  to 
testing  behavior   than  examiner   race  per   se   (e.g.,   Bucky  & 
Banta,      '^72;    Yando,    Zigler,   &   Gate^,  1971). 

Finally,    it   is  possible  that  a  box  score  strategy 
obscures  certain  developmental   patterns  that  may  operate  in 
this  area  of  research.     For   example,   Fpps   (1974)    noted  that 
data   from  various  studies^f   this  area   Indicate   that   the  age 
of  the  examinee  may  mediate   the  race  of  examiner  effect. 
Thus,    it   is  possible  that  any  negative  report  of  examiners  of 
a  different  race  on  black  and  on  white  children   is  strongest 
in   the  early  years.      However,    in   later   years  the  negative 
input  may  decre'ase  and   the  difference  can  have  a  facilitating 

effect   (cf.    Katz,   Atchison,    Epps ,  &   Roberts,   1972).  In 

i 

testing  situations  where  no  whites  are  present,   the  belief 
that   they  are  competing   with  whites  rather   than  wi  th  other 
blacks  may  have     an  effect  on  a  black  student's  performance 
(Fpps,    Katz,    Perry,       Runyou,    1971).      It   is  possible  that 
with  black  examiners,   theimplied  comparison  may  enhance 
performance.      It   is  also  possible  that  the  nature  .of  the 
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effect  of    implied   white  comparison   is  mediated  by  the 

M' )  •] !  M"  t- *  f    r).  ■  r  roo*.  U)\^   O'    t.;l^^   pr  c;h:^b  1  1  i  ty   tlv^t    (  r;)  hc>  will 

v.iv^'r'i    tho   r^r  v'Ou  b  1  ]  i  ty  of.   su^.-.-css    is   relatively  hinh,  white 
exanniners^,  have  a   facilitatinq   effect;  v;hen   the  prob:'  ility  of 
success   is   relatively  low,   black  examiners  may  have 
facilitating  effect.      While  some   studies  may  support  this 
{o.q.r    Savaqe  &   Bowers,    ]972;   VJatson,    1972),   Ppps   (  1  974) 
notes   that   this  area  needs   to  be  clarified.      Also,  the 
relation  between   th^^   task   itself  and   the  race  of  examiner  and 
race  of  comparison  effects   should  be   further  clarified  in 
empi r i ca 1    resea r ch . 

Sox.. of  .Tlxamjper 

I.  s  s  u  e  s 

Several    reviews  have  focused  on   the  sex  of   the  examiner 
and   its   possible   influence   in   intellectual   assessment  (e.q., 
Jensen,    1980;    Rumenik,   Capasso,   Sr    Hendrick,    1977  ;  Sattler, 
1974).     The  general   concensus   from  this  1 i ter at ure* is  that 
there  are  no  consistent  effects  of   th^  sex'  of   F.  However, 
Jensen    (1980)    concluded   that  some  evidence  suqqests  that 
female  f;s   tend   to  elicit  higher  performance   than  male  Fs  from 
both  males  and  females. 

Fnp3   (1974)    has  further  noted   that   there   is  really 
little   known  about  how  the   sex  of     B  affects  the  performance 
of  children  or   how  the   E's   sex   interacts  wi th^ the  S ' s  sex  in 
multiracial   or  * mul t i soc i a  1   settings.    Research  may  be  limited 
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because  either    investiqati    ns  have    involved  only  male 


been  examined   as   they  relate   to  possible  bias  or 
discrimir    tinq  effects   in  assessment   (Jensen,  1980; 
Kratochwill,   et  al . ,    1980).      Language  was  considered  an 
ipiport'ant  assessment   issue  as  early  as   191^,   when  large 
numbers  of    imT.igi^ants  came   into   the   United   States,      In  order 
to  make  assessment   less  d i sc^ im ina tor y  or  biased,   tests  or 
test  directions  hav.e  been   translated   into  the  "primary"  or 
"dominant"    lanquaqe  of   the  client.      Several   tests  (e.g., 
Wise,   v;echsler,    194  9;    Illinois   Test  of   Psycho  1  i  nq  u  i  s  t  i  c 
Abilities,   Kirk,   McCarthy,  &   Kirk,    1971)    have  been  translated 
into  another   lanquaqe   (e.g.,    Spanish),  but   the  number  of  such 
translations  as   used    in   the  U.,  S.    is   relatively  small. 
Nevertheless,    the   tactic  of  translating   tests  with  the 
presumed  primary  lanquaqe  of   the  client   is  one  criterion  for 
nondiscriminatory  assessment   in   PL  94-142. 

Ra  sed  on  cons  ider at  ions  of  what  effect   the  lanq  uaqe  of 
^    the  examiner  or  of   the   test   itself  may  have  on  the 

performance  of  children   from  a  bilingual   or  non-Fnglish 
background,    Jensen    (198P  pp.    605-6C^6   )   drew  the  following 


Lanq  uaq  e 


Issues 


r,anquaqe   and    related   factors  (e.q 


d  ia  1  ec t)    have  often 
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concl US  ions : 


Japanese)    qenereUly  ootair^   hiqher   scores   or   nonverbal  and 
per  f  or'Tiance   tests   than  an   English  verbal    test,   oral  or 
wr  i  t  ten 

2.  cmi    Fnqlish   lanquaqe  tests  of  scholastic  achievement, 
students  with   foreign   lanquaqe  backgrounds  usually  perform 

better  on  arithmetic  than  on   language   items,  . 

I- 

3.  Generally,    Mexican-American   individuals  score  hiqher 
on  the  Wechsler   £rnd   St anford-Ri  net   10  tests  when   these  are 
administered    in   Spanish  rather   than    in  English, 

4.  Generally,    the   language  spoken  by  the  examiner  ma^^s 
less  difference  on  performance  on   nonlanguaqe   tests   than  on 
verbal    tests   (oral   or  written) . 

5.  When   Spanish,   N^ex  ican-Amer  icans  ar^  equated  with 
Anglo  whites  and   Orientals  on   socioeconomic  status,    the  lower, 
performance  of   the  former   is  greatly  reduced. 

6.  Mexican-American  children   from  bilingual   homes  where 
both   Span  i  sh • and   Eng 1 i  sh  are  spo  ken   typi  cal 1 y  per  form  be  t  ter 
on  various  standardized   tests   than  children   from  homes  where 
Spanish   is  spo ken  exclusively, 

7.  Overall,   the   lan(^uage  of   the  test  or  examiner  makes 
less  of  a  differne'ce  on  performance   the  longer   the  child  ha^ 
a  t  tended   Fng  1  i  sh   lanquaq  e  school  s  .      Al  so  ,   t  he  d  i  f  f  e  ^^ences 
usually  found  between  verbal  and  nonverbal   tests  dec .  i nos 
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with  the   increasin     number  of   years   in  school. 

Th'.^  a5^r>t»ss'n  en  t  of   bilirKi'jal    chilrlr.en  special 

CiivcMi    in    [:nqlish   to   bilinqual   children    is   a   relatively  cfood 
short   tfum  predictor,   such   tests   should   not  be  used  for 
predictions   for  special   classes  when  more   than  one  year 
placement  would   be  made. 

In   order   to  be  sensitive  to  bilingual   students,   a  common 
strategy  is   to  translate'  the  test  or  administer   it   in  more 
than  one  language.     However,    there  are  several  difficulties 
that  may  emerge  when  this  alternative   is  pursued  (Kratochwill 
et  dl . ,    1930).     To  begin  with,   the  examiner  must  first 
determine   the  primary  or  dominant   language  of   the  child. 
This   is  not   st ra ig ht~ f or wa rd .     The  lack  of  adequate  langqage 
assessment   instruments  has  often  hindered   assessment  efforts 
as  well   as   the   implementation  of  special   language  pr<>c?rams 
and   identification  of  eligible  students.     The  major  problems 
include   (1)    determination  of  what  language  skills  and 
linguistic  structures   to  describe  and    (2)    the  identification 
of  adequate  tools  or   instrumervts  to  measure  language 
(Silverman,    Boa  ,        Russell,    1976).     These  authors  published 
the  Oral  Lanquage.Tests^for  .ai  1  i  ngual  ^SI  tudents   in  an  effort 
to  address ^ the  policy  advanced   in   thp   Bilingual    Education  Act 
of  1974.      Silverman  et  al .    (1976)    evaluated  various  language 
assessment  devices  on  dimensions  of  validity,  tec^^.ica/ 
excellence,   and  administrative  useability.     The  evaluation 
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was  conducted  on  commercially  available  tests,   tests  under 
devel  onnc^n  t  or   underqoinq   field   testinq,   and   tests  used  for 


tests   revi^,>v;od,   only   a   vorv  ')Uin   be   used    ior  lanquaqes 

other  than  Spanish  (e.q.,  m;^.T-S  EA-CAL  Oral  Proficiency  Tests 
19  76 )  .      Another   probl em   was   that   the   tests   rev  ic wed  had  a 

restricted   aqe/grade  ra-nqe. 

/  / 

The  concept /of  b  i^l  i  nq  ua  1  i  sm  also  presents  ether 

difficulties   in  a  practical   area.      Some  children  may  use 

Enqlish   in   school^  and   Soanish  outside  school    (home  and 

^community)  .      Such  children  may  fail   to  develop  a  s^ufficient 

mastery  of  either   language   { Sa 1 1 1 er  ,    1 974  )  .     For  example,  in 

some  studies   in   this  area,    Spanish  has  been  used  either  in 

test  directions  only  or   in   the  complete  test   to  administer 

standardized    intelligence   tests   t:o   Soan  i  s  h-speak  i  ng  children 

(e*q..    Chandler   h  Plako^s,    1  969;    Galvan,    1967;    Holland,  1960; 

Keston   &   Jimenez,    1954).      After   reviewing   these  studies, 

Sattler   (1974)    suggested  that  such  procedures  are  not  only 

frought  with  hazards,  but   that  "translations  of  a  test  makes 

\ 

it  a   hybrid   belonqinq   to  neither  culture"    (p.  39). 
E'ur  thermore ,   whether  or   not  bilingualism  will  constitute  a 
problem   for   the  child  will   depend  upon   the  way  the  two 
languages  are  acquired    (Anastasi  ^   Cordova,   1953).  Sattler 
(1  974)    also  argued   that  a  child  who  learns   tv;o  different 
languages   (i.e.,   one  at  home  and  another  at  school)   may  have 
m-ore  oroblems  than  the  child  who  learns  one  language  that  is 
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expressed  actoss  all  situations. 

v;hile   the   issues  surroimdinq  b  i  1  i  ri^^  ua  1  i  sm   are  less  than 


c 


oT;  I'^l  2  ..■•.i  1  o    Imouac^--   asr,essmcMit    anil   could   even   he   rela'torl  to 


observed  speech  difficulties    (Satl^ler,    1974  ).      Soec  i  f  i  ca  1 1  y  , 
it    is   possible   that  various   patterns  of   soeech  developed  in 
the  use  of  one  languaqe  can   interfere  wi th . cor rectly  soeakinq 
another    (Bebevfall,    195R;   Chavez,    1956;    Perales,  1965). 
Children  may  never  become  proficient   in   ^peaki ng , ei ther 
language    (Holland,    1960),   and   in   the  case  of  ^an ish-speak ing 
groups,  children  may  borrow  from  a  limited   English  vocabulary 
to  coir.plete  expressions  begun   in   Spanish.     They  may  give 
English  words  Spanish  pronunciations  and  meanings  and  they 
may  have  difficulties   in  pronunciation  and  enunciation 
(FV-^rales,    1965)  .  ' 

In   sammary,   translations  of  a   test  may  provide  a 
promising  alternative  to  reduce  bias   in   the  assessment 
process.     However,  mere  translation  of  the  test   into  the 
"primary"   language  of  the  child  has  several  conceptual  and 
methodological  problems  that  has  not  been  adeauately 
addressed    in  researcii   in   this  area. 

Some  minority     groups  speak  English,  but  there   is  a 
clear  dialect  difference  from  standard   Engl i sh .      For  example, 
many  black  children  speak  a   form  of  black  dialect  English 
that  varies  considerably  from  the  standard   English  spoken  by 
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many  white  children.     Oa-<land  ar.^   Matuszek    (1977)    noted  that 
language  biases  may  be  encountered   in  assessinq  black 

chilclron  who  m-Doifost  elements  of   no   ^;tnndard  dialpcts, 

s  iqn  i  i  leant  1  y  different    from   those  nuin  i  f  os  to'^   by   blc-icks  (and 
other  minorities)  /    ir  which   language  patterns  also  are 
ordered   and   rule  governed    (Bartellr    Grill,  &   Berqen,  197?; 
Gay  fi    Abrahams,   1973).     Dialect  differences  are  not  limited^ 
to  racial  groups.     M.any  whites  from  certain  parts  of  the 
country  or  various   SF.f>  Ifrvels  speak  with  a  dialect  that 
varies   from  that  spoken   in  the  majority  white  culture.  The 
issue,   no  matter  what  the  dialect,   is  whether  or  not 
differences  on   this  dimension   influence  performance  on 
standardized  tests   in  a  wa y  that  will  bias  decisions. 

A  point  has  been  made  that  even   if   English   is  the 
primary  lanquaqe,   there   is  considerable  variations  among 
cultural  groups   in   terms  of   ;:o'^.5plex    language  idioms, 
colloquialisms,  words  and  phrases  with  multiole  meanings,  and 
words  and  phrases  of  similar  but  not   identical  meaning  within 
a   language   (Garcia,  1976).     It  has  also  been  verified  that 
even  if   English  is  the  primary  language,   testing  procedures 
may  not  equate   for  differing  cultural  or  sjbcultural 
information   learning  strategies,   and  valu^  systems  (Alley  & 
Foster  ,    1978)  . 

Nevertheless,   Jense;n   (1980)    noted   that  a  number  of 
studies  have  suggested   that' black  children  comprehend 
standard  English  at  least  as  well  as  their  own  nonstandard. 
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dialect  and  that  their  understanding  of  standard  English 
occurs  at  an  early  aqe  {e.n.,  Fisenborq,  T'-.^rlin^  Oill^  & 
Sholdon,    ^9GR;    Hall  h    Ti^rncr,    1  ^'1 }  ,    1074;    Harmus,    1961;  Kraus 

in   tnir>  a^'ca^   the  G'lipirical    literatULG  or.>vides   no  supoort 
for   the  effect  of  dialect    (Jensen,    1980).      In  an  early 
study,   Crown    (1970)    studied   the  effect  of   languaqe  dialect 
(black  versus  standard    English)    and   race  of  examiner  (two 
black  and   two  white  examiners)    on   the  Wechsler   Preschool  and 
Primary  Scale,  for    In  tell  igence   tV/PPSI).     The  results  showed 
no  effect  of  dialect  and  no   interaction  with  race  of  examiner 
or   race  of  student.      Similar   resulf<C  that  do  not  support 
dialect  effect  have  been   found   in   n  series  of  studies  by  Quay 
(197  1/  197^2,    1974)'  on   the   Stanf  or  d-Bi  net  Form  L-M.  Thus, 
results  of  empirical   work   in   this  area  do  not  support  the 
notion   that  dialect    influences  test   performance.  However, 
there   is  relatively  little  work   in   this  area. 

Bias.. in ^Test  .SCO ring 

Sc'oring  bias  refers  to  any  systematic  error   that  occurs 
in  deriving   the  scores   from  the  test   (i.e.,   systematic  errors 
in  scor  ing)  .     Research   in   thi  s  area  has  usually  found  some 
halo  effects  on  such  tests  as  the   Stanford   Binet  and  Wechsler 
scales.     For  example,    in   the  usual  procedure  in  this  area, 
examiners  are  giv^n  expectancies  that  a  child   is  bright  or 
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dull   with  various   ambiguities   in  various   response  items*  In 
some  studies  where  expectations  are  manipulated,  examiners 
tend   to  over te  'amb iq uous  responses   for  hiqh  exoectancy 

Hillox.,   ^    Ki^lv?r  ,    1^70;  Si-noq, 

There  are  at   least   three  probleiBS   in  work   in   this  area 
(Jensen,    19B0).     First  of  all,   research  has  generally 
produced  effects   that  are  of   little  or   lo w  maqn i t ude . 
Second,   and  perhaps  more   important,   the   research  has  usually 
been  conducted  under  more  analogue  conditions.  The 
expectancies  are  contrived  and   the  study   is  cpnducted  under 
laboratory  or   non-field  conditions.     Thus,    it   is  not  at  all 
clear   that   the  results  would  occur   under  conditions  present 

c 

where   IQ  tests  are  usually  adn  i  n  i  s^t  er  ed    (e.g.,  school 
settings).     Finally,   studies  demonstrat i nq   halo  effects  have 
usually  failed   to  determine  .if   test  validity   is  cooipr om i sed  . 
Jensen    (1980)  notes: 

The  most  telling  experimental   oaradigm,  which  has  never 
been  aoplied,    would  be   to  substitute  a   small   number  of 
ambiguous   responses  made   in  authentic  test  protocols 
ranging   widely   in   total   score   and   note  the  degree  of  * 
discrepancy  between   ratings  given   to  the  substituted 
ambiguous   responses  and  the  ratings  given  to  the  S ' s 
actual   responses  on   these   items.     Based  on  probability, 
it   IS   likely  that   the  halo  effect,   on   the  average, 
enhances  the  scoring  validity  of  highly  ambiquous 
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responses   (p.    61(^)  . 

Some  research  has  ^Iso  been  conducted  on  the  effect  of 
scoring  bias  as  a   function         race  (e.q.,    Jacobs  &  DeGraaf, 

There  was   no   siqnif  leant    interaction  with   race  ef   F-jb-ject  or 
race  of  examiner,   and   no   interactions   cmonq'these  factors. 

Tiiere  is  also   some  evidence   to  suggest  that  examiners 
give  higher  estimates  of   intelligence   for  blacks  and 
Mexican-American  children   than  for  white  children  with  the 
same  measured   10    (Nalven,    Hofmann,  &    Bierberger,  1969; 
Sattler  Sc  Kunck,    1976).     Jensen   (1980)    noted   that  such 
results  may   indicate  that   some  psychologists  either  accept 
the  notion  that  tes.ts  underestimate  the   IQ  of  minorities  or 
that  more  \Nj^ight   is  given  to  various  ability  factors, 

Bias.iD-Qbservational ^Assessment 

Fiias    in  assessment   is   not  limited   to  standardized 
ability  measures  as  usually  conceived.     Such  assessment 
procedures  as  direct  observations  in  naturalistic  settings 
have  also  been  examined   for  bias.     Indeed,  a  numbep"  of 
authors  have  provided  reviews  of  this   literature  discussing 
such  factors  as  i nter observer  Mt^eement,   training  observers,/ 
code  complexity,   and  communication  among  observers  (e.g., 
Johnson  &   Dolstad,    1973;   Kazdinr    1977;   Kent  &   Fostfer,  1977; 
Wasik  h    Loven,    1980;   Foster  &   Cone,    1980;   Wildman  F.rickson, 
1977;   Haynes  &  Wilson,  1979). 
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One  strateqy  to   investigate  bias   in  observational 
assessnnent   is  to  cieatG  expectancy  instructions  reqard  i  )q 
chanqes    in   the  clients  boin'a  observed.      In   an  early  study 

eitn.^r    ji^'r'ons^-    (ar'"n:p    1)  iiocreasr>    (    r  o  ir^    ?\    (Un  iiv 

treatm.ent.      A  third   qroui^  v/as   told   that   the   researchers  w.-rt 
unsurr>  of   the  effe^  ts   of   the   treatnnent.      The   authors  found 
that   all   grop.os   recorded   a  decrease   in  disruptiy^^  behavior 
with   the   trcvi  tiTten  t  ,   with   the   larc^est  effect  occurring   in  the 
g-roups  of   observers   who  were  provided   the  expectation  that 
the   frequency  of   the  disruptive  behavior  would  decrease. 
Although   it   is  oossible   that   this  effect  exists,   this  study 
has  b^-en  critici?.ed   on  methodological   grounds   (e»q.,    Jc  .nsoa 
&    Bolstad,    1973;    Kent,   O'Leary,    Diamont   &-Diet2,    1974)  and 
has   not  been   teol  icated    in   two   attempts   (Kent  et  aJ-,  1974; 
Skindrud,    197?  . 

Bias   in   observational   assessment   has  also  been  studied 
by  providing   observers  differential    feedback  concerning  their 
conformity  to   ratings  provided  by  the  experimenter  (O'Leary, 
Kent,    h   Kantowitz,    1975).      In   the   study^   observers  were  told 
that  a  decrease  was  expected   in   the  frequency  of  occurrence 
of   tv;o  categories  of  behavior.     They  were  also   informed  that 
no  change  was  expected   in   the  frequency  of  the  other 
categories.     The  authors   found   significant  decreases   in  the 
frequency  of   the  categories   for  which  a  decrease   in  frequency 
was  predicted.     Also,   no  differences  were  found  for   the  two 
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otl^or  categories.     Thus,  differential    feedback  and  prediction 

of  a  dt^creaso  Ich]   to  biased  assessment  •     However,   a  control 

I 

qtovr^  v;hich   received   no   feedback   should  have  been    included  to 

■  ,  ;i))r;      <  >M     !  n ' :  M  1 i  t  ri''- */  Dr ''•d  i    1  ]  o^r;    .ilori'"  (V.Mjdm.in 

.1;-    ■■   v-p;  -i    \<o  I  k    \  :)    t,  h  i  s  .'onv»\>)    s '^''i  e  t:  h  i  iv ;  .^1' 

li^^.    .i.i.'or    ot    lactors   that   a^^iv   bias   observational    a  3  ses  sno  n  1,  . 
Sp-  :  .  1  ic   rocomnnGndat  i  ons   for   obtain. nn  rr^ore  valid  and 
reliable  6^a   through  observational   procedures  are  presented 
in  Oiaoter   7    (pp.  PHO-n^O). 

Cias  .Due  to-Timed  .vs  Un  t  imed -Tes  t  i  nq 

v:hethf^r   or   not  a   test  -is   timed  or   untimed    ( soeed  vs 
power   tests)    has  sometimes  been  postulated   as  a  factor 
cont  r  ibut  i  nq   to  bias   in   test  i  nq .    *  However ,   emoirical   work  in 
this  area  has  not   suoportc-d   this  possible  source  of  bias. 
For   exa-nplo,    Hubin,    Osborn  ,   and   Winich    (1969)    studied  the 
effect  of   tin^e  limits  on   the   testing  of  bla^-k  and  white  high 
school   students  of   low  and   high   SES  groups   .      The  authors 
found   that  both  whites  and  blacks  obtained  higher   scores  as  a 
function  of   practice  and   an  extended   time  limit.      Thus,  the 
findings   indicate   that  black  subjects   (and   low  ^KS)   were  not 
penalized  when  given   no  extra  practice  for   speeded  tests. 

Although  there  is  no  evidence  fo  bias  through  time 
factors  in  tests,  there  is  very  little  work  in  th$  area. 
Jensen   (19B0)    noted   that  two   tyoes  of.  speed   factors  have  been 
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identified  following   Spearmnn's   (1927)    work   in  this  area. 
Ono,   "speed  of  coqnition"    refers   to   the   speed  with  which  an 
individual    recalls  relf?vant    infornation   for   anijwerinq  a 


1  n.i  IV  i>:  huvo  a    Prefct^Mice    for    :^Pood    in   ocrforTinn  :i 

certain    task,   a   speed    factor   Jensen    (1^80)    has  labeled 
"personal    tonne".      However,    Jensen    (19R0)    noted   that  no 
evi:].;ncc^  suooorts   thr^  notion   that   a   personal    tempo   factor  (in 
contrast   to  coqnitive   soeed)    contributes   to   any  meaninqful 
difference  between   test  scores  of  various   racial  and 
soc  ioeconorn  ic  qroups  . 
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In   this  chapter  we  provided  an  overview  of  situational  bias  in 
assessment.     As  noted   in   the   introduction   to  this  chapter,  situational 
bias  and  assessment  refers  to  those  features  that  are  a  part  of  the 
assessment  process  but  have  not  been  specifically  considered   in  terms 
of  the  technical   test  bias   features  described   in   the  latter  part     f  the 
report.      In  the  chapter  we  provided  an  overview  of  test  sophistication. 
This  area   included  practice,  coaching,   test  sophistication,  and 
interaction  with  race  and   social   class.     We  noted   t,hat  generally  there 
is  a  paucity  of   information  to  suggest   that  these  features  bias  tests 
in  any  systematic  way.     Nevertheless,   there   is  need   for   future  research 
in   this  area. 

A  host  of  motivational  and  situational   factors   in   the  assessment 
process   were  reviewed   in   the  chapter.     These   included  such   things  as 
motivational  components,   (e.g.,   reinforcement  and  incentives), 
situational   factors,   test  anxiety,   and  a  number  of  oth^  :  variables 
including  achievement,  motivation,   self-esteem,   reflectivity,  and 
impusivity.      In   this  area  we  were   impressed   with   the  lack  of  empirical 
information  pointinQ         any  strong   influence  in  mot^ational  and 
situe      o'nal   factors.      Nevertheless,   we  must  emphasi  ze  tha  t   the  fact 
that  studies  are  not  supportive  of  one  particular  direction,  does  not 
necessarily  mean  that   these  factors  can  be  el^lmina  ted  as  potential 
candidates   for  assessment  bias.      Indeed,   in   some  areas  such  as  the 
reinforcement  literature,  adequate  tests  of  the  motivational  components 
have  really  never  been  tested  due  to  problems  in  the  way  studies  have 
been  conceptualized.     It  appears  that  an  individual  analysis  6f 
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motivational    factors  could   likely  yield    important  data  reqarding  the 
influence  of  these  factors   in   test  bias.     Never t^^eless  ,  we  must  aqain 
n^int-    f-o   t:h'-  n<^^-^ri    for    future   rpsorirrh   t-p   furthor   elucidate  different 
mot  iv  at  ionri  1    and  situational    factors   in   the   test   bias  literature. 

In   the  next  area  race  of  examiner  was  discussed.     We  conclude,  as 
have  other   researchers   in   this  area,    that   there   is   insufficient  data  at 
this   point  to  draw  firm  conclusions   reqarding  a  rJice  of  examiner 
effect.      Similar  conclusions  can  be  drawn    in   terrns  of  sex  of  examiner 
i  ssues  . 

Another  area   where  situational   bias  has  been  examined   is  in 
language  considerations.      Lanquaqe  has  been  explored   in  more  detail 
than   some  of   the  other  areas,  but  aqain,   there  are  very  few  studies 
that    indicate  that   lanquaqe   is  a  sole  biasinq   feature  when  other 
variables  are  considered.      However,   at   the  most   str a iq ht f orwa rd  level, 
administer inq  a   test   in   Enqlish   to  a  child  whose   languaqe   i^   other  than 
English  certainly  could   be  considered  bias^  in  assessment.     Yet,  when 
some   lanquaqe   factors  are  considered,    the  role  of  languaqe   factors  in 
assessment  bias  becomes  even  more  v-^omplex.     We   pointed   to  some  areas  of 
future   researh   in   this  area  hoping   that   some  new  areas  of  investigation 
could   be  opened. 

Finally,   severa l  other  areas  of  potential   bias   in  assessment  were 
discussed,    including   bias   in   test   scorinq,   observational  assessment, 
and   potential  bias  due   to   timed  vs,  untimed   testing.     Work  in  each  of 
these  areas   is  relatively  primative  at   this   time.     However,   at  this 
point   there,  is  no  clear  ^v^dence  that   these  factors  have  resulted  in 
systematic  situational  bias   in  assessment. 
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After   reviewinq   the  rather  exten-,-.ve  literature   in   this  area,  we 
have  to  conclude  that   it  is  not  a  matter  of  more  research  in  each  area 

but  rattier  the  specific  type  of  research  that  needs  to  bo  r^ondurtod  in 
the  future.     Various  areas  for   future  research  were  outlined. 
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Outcome  Bias 

Tnc  concoDt   of  validity  as  traditionally  conceivcr]  • 
focuficr,  almost  exclusively  on   how  well   a   test  measures  the 
coiistrnct    or    jatppt   tiait    it   ol. ports   to  measurr.      The  neofl 
tor    such  validation  work   is   obvious.      The  qoal    in  test 
dovel  oprnent  ^   albeit   never   reached,    is    to  create  a   test  that 
is  pt^rfectly  correlated   with   the  construct   it  measures. 
Validation  efforts,   traditionally  conceived,  focus 
exclusively  on  d  emons  t  r  a  t  i  nq   the  correlation  betv/een   the  tv;o. 
With  resoert   to  bias,    the   research  efforts  discussed  in 
Chapters   ^   and   5  have  examined   the  validity  of   tests  to 
determine   if   they  are  measurinq   the   same  construct  across 
qroups  and   if  so,   are  they  equally  valid    for  all. 

From  an   alternative  Perspective,   that  concept  of 
validity  which   focuses  on  demonstrating   the  correlation 
betv;een   t.ie  te^st   and   the  construct   it  purports  to  measure  can 
be  viewed   as  narocial.      ItJ^cap  and  has  been  arqued   that  the 
concent  of  validity  should  be  broadened    (Cole,  lORl; 
Cronbach,    19?n;   Messick,   1975).     While  studies  conceived 
within   the  Parocial   concent  of  validity  orovides  us 
information  to  help  explain  why  an   individual   performs  the 
way  he/she  does  on  a   test   (i.e.   he/she  possesses  the 
construct  to  a  certain  degree) ,   they  tell  us  practically 
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nothiivi  .ibont    the  vnlitl   uj>o  of   tho  tof.t-.     K^on  t.<vst;s  nro  uf^oH 
for  decision  makinn   that   results   in   the  selection  of 
i nd i " inun 1 s  ,   tho  nlnnni'  '   of   trontmont   for    individuals,  or 
both,    it    ii;  vital    that    information  he   available   that  providc>s 
thf>  doc  i  s  ion-makor  (  s)    the  most   valid   information  on  which  to 
baso  a  docision,   that   is,    information  that  predicts  the 
dosir^  d  outcomes  v/i  th   the   least   amount  of   inference.     When  wo 
add   th-  nation  of   test   use   to  our  concent   of  validity,  by 
m-fos-,  1  ty ,   we    forrus  on  whether   or   not   the  outcomes  of  the 
process  emnloyinq   the   tost   are  desired.      In  order   to  collect 
data  on   the  validity  of  a   test   under   this  broadened  concent 
of  validitv,   one   would  hav(>   to   know  precisely   the  desired 
outcomes,   that    is,   the  nurnos'^   for   the   test's  use. 
Consequently,    under   the  broadened  concent un 1 i t i on ,   there  is 
no  such  thinn   as  a  valid   test;   only  tests   that  are  to  some 
deqrce  valid   fop  a  purpose.      Likewise,   a   test   can  have  both 
validitv  and  bo   invalid   accord  im   tcJ  the  decision  one  makes 
with   it  and   the  desirability  of   the  outcomes  of  those 
decisions.      In  addition,   under   the  broadened   concent  of 
validity,  when  we  want   to   study  bias  we  are   interested  in 
whether  or   not   the  outcomes  are  the  desi  red  .outcomes  for  all 
groups  . 

VJhen  traditional   validity  aopr caches  use  external 
cri teria ^aqainst  which  to  validate  tests,   the' effort  can  be 
viewed  as  an  attempt  to  demonstrate  the  test's  relationshio 
to  criteria   in  which   the . constr act   is  hypothesized  to  be 
associated^    Predictive  validity  studies  that  employ  exterr:  i 
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crilfria  do  h.ivo  n  r)ract.icnl    sirle  to  th^m  since  tho  external 
critori..   eniployed   is  often  an    important  criteria   to  which  ono 
pr."i  I'-t.      in  <),■(■  i  s  ion-m,-jk  i  ti<|  .      For  rxatinlo,  vnlirlatinq  an 
int(>l  1  i<ionce  test,    in  p^rt,   by  demonstratinq   its  relationship 
to  a  standard  i '/od  academic  achievement  ter;t   not  only  shows 
the   test    IS   acting,  the  way  it   should,  q  iven   the  construct  is 
measur  i  Tiq  ,  but    it  also   oroyides   information  on  the 
re  1  at ionsh i D  of   the   test   to   an    important  criteria  (e.g., 
academic  achievement)    such   i n f o rma t i on   i s  useful    when  makim 
decisions  about   soecial   educatisiS  Dlacement.      Information  on 
this  rolationshio  helns  us   increase  the  probability  of  making 
a  correct  decision. 

Yet,   v/hen  we   take   a  closer   look  at   a   tynical   decision  of 
this   sort   and    focus  on    the   intended  outcomes,   we   see  the 
larqe   inferences   in   the   interoretat ion  of  test  data  when  we 
rely  solely  on   information  provided  by  predictive  validity 
studies.     Three  major^areas    in  which  we   lack   information  can 
be   ident  i  f  ied  . 

P red  ict  i  nq  -Soec  i  f  i-c  Outcomes 

The   first  area  where   there   is  a  paucity  of  information 
involves  how  much  is  known  about  the  relationship  between  the 
desired  outcomes  and  the  test.     For  inferences  presently  made 
with   tests   to  be  re  nc'd    in   size,  one  needs  to  clearly 
identify  all  desirec  outcomes  and  have  empirical  information 
on  the  relationshin  vl    the  test  to  a  criterion  that  ^^-t ^ 
defines  the  outcomes.     In  our  example  above,   if  one  desired 
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out:c:c)nie5;    i-    to   f)rovir1o   nn   viiQc\'ivv   i  n  t  o  r  v  c*n  t' i  oq  throuqh 
iHlucational    placornenV,    then  one  rieods   to  qathcr  information 
on  hov/  v/e  1  1    tho  tost   prerlicts   the  e  f  f  oc  t  j  v  f>no?^s  of  tho 
■plactxni^nt    (i.o.,    the  dosirod  outco:Tio)    with  el  feet  ivcne^ss 
^    cm  hod  I  od  and  definod    in  a   criterion  or   set  of  criteria.  Tt 
is  evident   that   information   from  predictive  validity  studies, 
traditionally  conceived,   provide  minimal   information  on  the 
use  of   tests   for    this  purpose.      Pr ed i c t i ve ,  v a  1  id i t y 
1  n  f  or'i^a  1"  1  bri  door^   rrovide   the  advantan€\  of  makinq  pr.odictive 
statements  regardii  -  how  well   the  child   will   perfl^rm   in  ^the 
future^  on   standardised   tests  of  academic  achievement  but  - 
provides  no   information  on   the   test  as   it   relates   to  the 
e  f  f  f^r  t  i  veness  of   the  olacement. 

v;ith   resoect   to   the   latter,   predictive  validity 
information   tells  us  how  v/e  1 1    the  child  will   perform  wf thout 

placement,    it  does  not   tell   us  how  well    the  child  will 
%  

perform  wi  th  .placement  .      If   the  desired  outcome  of  a 
placement  decision   is   to  help   the  child   learn  more 
effectively,    then  beinq  able   to' predict   this   from  a   test  is 
of  much    importance   to   the  decision  maker  and   within  the 
purvie^w  of  a  concent  of  validity,   broadly  defirred.. 

In  addition   to   the  above  problem   in  predicting   to  one 
criter  ia(e.q  *  ,   standardised   acliievement   tests)  across 
'    diff(^rent"  conditions   (i.e.,;  with  and  .without   placement)  is 
the  Problem  of   the  criteria   to  which  one  predicts. 
Certainly,  when  decidinq  on  special   education  placement, 
-standardized  achievement   tests  are  only  one  measure  of 
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acM'iciTiir   r>..r  fotn.nic   Mi  it    r,«n  he  uscmI  .      Ottrrr,   such  ,is  work 
samples,   cTctual    locTrninq,   and   teacher    ratinqs  may  also  be 
important    to  ust?  as   a  critoricn  measure.      rh  i  s   is  a  question 
ot    thr   valjciily   ol    Ui.^   .:iiLt.-iia    urud    uivl    .,uch   vnlKiity  can 
only  b<>  .i.?termin(>rl  after  n  thouqhtful    an.ilysis  of   the  puroose 
of  ar.sessnient  .     Once  a  criterion  or   set  of  criteria  are 
decided   upon,    the  validity  of   the   -Predictor  measure  (e.q., 
intelliqonco   test)    needs   to  be  validated  across   placement  and 
nonol  acciv-nt   5;  i  t  ii -t  t  i  oris   to   iudqe   the  efficacy  of  the 
predictor.      N'ote  that    in   traditional    orodictive  validity 
.studios   the  criterion   is  customarily  chosen   in  keening  with 
the  ourpose  of    the  validation  effort,   that    is,    to  show  the 
test   predicts  the  criteria   it    is  hypothesised   to  oredict,  not 
necessarily   in  accordance  with   the  ourpose  of  any  decision  to 
be  made. 

When  ones   notion  of  validity   includes  outcomes,    it  may 
broaden   the  definition  and  consequent  search   for  bias  in 
assessment.     The  concern   for  bias  under   such  c'i  rcumstances 
vADuld   entail   whether  or   not   the  test   is  valuable  in 
prediqtinq  eoually  well   the  effectiveness  of  placement  across 
qroups.      Once  an  appropriate  criterion  or   set  of  criteria 
have  bf-en   identified,   a   test  used   to  nredict  the 
effectiveness  of   placement  should  be  demonstrated   to  be 
equally  effective  for  both  minority  and  nonminority  children. 
Tests -  Wi  thin  -the  Assessment  Process 

The  second  area   in  which  there   is  a  lack  information 
involves  the  various  forms  of  data  that  are  considered  in 
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(1 .  •(  •  1    I  on   ni.tkinq;    th.it     if',    ni  k1      f ;  t  .uh  1  i     )    of    tlh*   < » s* 5;itu.'M  t. 
|)ro('tS5..,      A.s  cM;nvon  1. 1  OHcJ  J  1  V  defined,   ;jn   asr^ossnnont   r>roco5^s   i  5i 
thrn    process?  of  colioctinq   data    for  decision  makinq  (Cancelli 

f)nl'-y,    in   pres,s)  .      Din^  ussions  of  validity  to   this  point 
h  ivi-   fornse^l   on   th^^  validity  of   toists.      However,    tests  are 
but   ot)o  (^lement   of   the  assessment   process.      A  wealth  of  data 
a  re  '  usi!)a  1  1  y  eTipl  oyed   to   oredict   outcomes  and   to  make 
do^cisionfi.      In  our   example  above,    the  decision   to   place  a 
child    ifi   n   sor^cial    ^duration   class    involves,   by   law,  the 
effor;^;  of   a  in  u  1 1  i  d  i  sc  i  pi  i  na  r  V   team.      Kach  member   brinqs  witn 
hin/hec    relevant    (  anr]   un  f  o  r  tunr?  te  1  y   irrelevant)    data  for 
predir-tinq  outcome;.      In   addition,   some  of   the  data  are  often 
sub iec rive  and   hiqhly   situation   specific.      Data   can  include 
i  Ae  subiective    i^noressions  of   the  personality  characteristics 
of  the  soerial    education   teachf-^r  ,    the  student,   and  the 
int6rc,ction  of   the  two,    the  characteristics  of   the  students 
in  the  class  with  whom   the  child  may  be  nlaced,  the 
•JG  jper  at  ion  of   the  oarents  with   the  placement  and  so  forth. 

How  all    these  data   fit   together  v*;ithin   the  dynamic 
process  of   teaminq    in  makinq   a   prediction  about   the  success 
of  placement   is,   of  course,   a  validity  Question  when  one 
includes    in   the  concept   of   validity  evidence  of  the 
e  f  f  ec  t  i  \' oness   of  decisions  as   iudqed   by  desired  outcomes. 

Other  Considerations  .With!  n  .the  Dec  i  s  i  on-Mak  i  no  t^rocoss 
The  third  area   in  which  we   have  litt    ^  information 
involves  how  considerations  other   than   those  drawn  from 
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psV^'ho  1  O'l  i  (•  I  1    .Hvl    r-  inc.it  1  orM  1    ^l.'it.i     ire  hrouM^^t    t      bc<ir    on    I  l^o 
(U-H-isions   that   <iro  made,      Thoso  cons  iHor  a  .  i  ons  otton  arc 
i  n:ieponrient   of   tho  orc^dictions  of   thi^  e  f  f  oc  t  i  v  oness  of 
f>ut  coMi.'S   as   (i('t(>r[ninod    throiKih   nr.  yr  ho  1  cv]  i  ca  1    moasnrr^'^.  .  '^onfm 
of   lhl^';o   issiu^s  ^-volvc^  out  of  othiral,  moral, and  Itn^al 
standards   and   the  value  of   their   use    is   judqod    m   terms  of 
social   value    (Mossick,    197S).     Tho   societal    impact   of  placinq 
a  disproportionate  number  of  minority  students   in  classes  for 
tne   mriit.ally   ha  n(]  i  ca  pnc  v  1    i  .s   an   c^xamplc?  of   one  such 
cons  idera  t  iori  , 

Otner   such  considerations   evolve  out   of  practical 
features  of   assessment.   Whether   or   not  a  more  valid/less  cost 
efficient    or    less   valid/more  cost  efficient  assessment 
battery,  should   bo  used   is  one  example  of  a  practical 
consideration.      Still    other   considerations   stem    from  pur 
concern   for    the   integrity  of   the  decision  makinn  process. 
These   issues  o f ten.  i n vol ve  the  decision  maker(s)s'  concern 
for    the   less   th/.n  perfect   reliability  and  validity  of  the 
predictors  employed  and   their  yet  unidentified  biased  nature. 
Such  concerns  also  bear   the  potential   of   impacting  on 
d  ec  i  s  i  ons  . 

Decisions   regarding   v/hether   or   not   to  consider  these 
factors  has   to  evolve  out  of  a  clear   understanding  of  the 
purpose  and  desired  outcomes  of  assessment.     Their  use  can 
not  be  determined  by  scientific   inquiry.     They  are  value 
judgments  bas^^d  on  such  concents  as  eouality  and  fairness. 
As  r»uchr   they  are  not  validity  questions  in  a  traditional 
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,  n     .  .      N't  •  t  ,     in    t      ■    1  i  I  '  .  '  !  t  ' .  t     •   ■  jr •    ';     ■  li   t  ■  <  t  rrs  i  H  ♦ '  t  1 1  i  i )  rr>  Ho 
unp^rt  whet  her    (m    not.    (-(Mt  iin   dp?',  )r(»H   outcomi-.s   comi^  .^hout 

pi  or>')r  t  1  ori.i  1    r  cM)r  o  r;r»n  t  .  i  i  i  on )    and    in   th<\t   sonno  ar(* 
r|iuv-.t  i  orif;  of    v<ili(]ity.      Thus,    thoy  arc   iusl:   as    important:,  if 
n(  1     noie    I'liDor  t  jpt  ,    to   under  St  and    than   arc^  cic^t,}  collortori 
,fi:oin   tests   riiid   other   ar,pects   of    tho   assessment  process 
diseussv'd   c)bove,      Tlie    importance  of   under  stand  i  nq  such 
cons  id(^rat  ions   lies   in    the   amoral   nature  of   ns  ychol  oq  i  ca  1 
ar;s*s<iTu-nt    L.ta.      For    example,   a    technically  unbiased  test 
do^^s   net    .}n.u  Hitce   that    i  tf;   use   will    not    result    in  socially 
und-:    1  r /ib]  r   ro  n  ?*ienu<  nice  s  ,      The  nuestion   as   to   v;hat  is 
socially  desirable  althouqh  based   on  values   should  be  a 
concern  ''to  all    those   involved   in   the  assessment  orocess. 

As  one  can   readily  see,    these  considerations  could  be 
employed    to  have  an   impact  on   the  desired   outcomes  of 
decisions  as   they  relate  to  members  of'  various  minority 
groups.     As    identified  above,   for  excample,  decision  makers 
may  wish   to  have  as  an  outcome,   proportional  representation 
of  minority  and   nonminority  children   in  classes   for  the 
mentcaiy  handicapped .      Such  a  consideration   would  have 
nothinq   to  do  with   improvinq   predictions   that   are  made 
reqardinq   the  effectivenss  of   placement  but   nonetheless  may 
ir.pact  on   the  decision  n   t   to  place  certain  rainority  students 
in  such  classes . 

S e 1 ection  versus  Intervept ion 
When  we  turn  attention   in  assessment   to  the  study  of 
outc'omes,   the  type  decisions  made  with  tho  data  need  to  be 
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p,,.^.l:.(    ly     K  i  -  M 1  t  1  I  1  <  •  ^  i  .        V.'l  M  1  <  '     the    \^,ill(iltV    <W     .WW    ti^:\\  i>\' 

asst-;snurH    f)i()c^-;.s  fnn.st    hi*    iudqrd    within    the  context    ot  th^' 

i  ?;  i  Dn   to  h^^  mri(h«     ther<>  arc*  cortnin   tvrx^  doc  i. si  on?;  that 
r.tu  ht>    i(h>ntif  ioci    for   stnriy.      Two   suc^h   typo  rlocisions  arc 
tli(v;<'   ()l         lection  nnd    1 1 )  t  c  r  v  <  >n  t  I  o  n  , 

Tho  major  ditrc^ronce  botwocn    those  two   tyne  docisions  is 
\\\   th.Mr   fMirnos(  •      Soloction  decisions  aro   those  that  ronuiro 
th-  dc*ciJ>ion  maker  (s)    to  choof^o  whether  or   not   the  individual 
a?.'w  sscd   .should   hc»  5;c>  1,  f  ^c  t  < -d  ,     Common   tyi  ^  selections  that 
oinp->y   i>syc-h()l      ical    data    in    the   nrocess   are    for    the  purnoso 
of   tM.t>l  oy^nen  t   and   admi.ssions,      F^oth    involve  decisions  to 
include  or   not   to    include  and   the  effectiveness  of  the 
decisiorl  and   antecedent  assessment  process  are  determined  by 
whf^thor  or   not   those  who  are  selected   are  those  who  the 
decision  makers  want   to   select.     For  example,    in   the  area  of 
employnient   testinq   the  effectiveness  of  a  decision   to  select 
someone  for  a   job  needs  to  be  jiidqed  against  what   the  desired' 
outcomes  of   that  decision  are.      If  an  employer  chooses  to 
select  only  those  aoo]  icants  who  have  the  best  chance  of 
succeeding   on   the  job,    then  effectiveness  needs   to  be  judged 
by  hov;  well   the  battery  of  predictors  accomplish  that  qoal. 
If   fTnothor    employer  wishes  to  choose  those  most   likely  to 
succeed  v/ithin  certain  defined   racial/ethnic  groOos  so  as  r.o 
mnii  ..;M  i  oroportional   representation  among   these  various 
grcup5i,   tlion  the  effectivenss  of   the  decisions  and  assessment 
process  needs  to  be  judged  against  that  desired  outcome. 
Intervention  decisions,  on  the  other  hand,   have  a 
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>  i  )  I  t  '  •  r  '  -  n  t    pi  1  •    .ml    t  h  •  r  '  -  f  ( « !  >  •   <  I  j  M  * '  r  <  -  n  t    ( 1  >  •  s  i  r  <    1    ou  t  r  ( »m  . 

Tht'    iMjtpos.'   ot     intt'tv'iit  iMU   (It     1  s  i  oils    is    to  fU(*vi(N« 

U( '  ( •»  • 'i  s  f  ij  1     (oT'ns   of    trcitnicrit    <j5'.   a   con  sor^nr  <  nr*f  •  of  fh*- 
(li'Cision,  t.y^)o    i  n  f  o  rrnn  t  i  oii  donircHi    for    i  n     rvcMU  i  on  s 

'  1  <  ■ '  •  1  '  I  ' )  n  ,     I  ' .    t  h .  I  t     v; ' )  M  ' !  1     ;  i  1  1    *  ■  n  .  i ; )  1  < '    f  h .  ^    f  i »  '  ^  •  i  ; ;  i  ( >  n   m  ,  j  ^; » -  f  ' .  to 
Ixtlor   (•[)r>()S'    in    1  u  t  or  v»u)  t  i  on   and   con*;' ^qiion  t  1  y  hotti^r  priulicM 
the  <lt.v;irofl   outcome    (  i      .  ,      f  f  cc  t:  i  ^      i  n  t  o  r  v  (^n  t  i  o  n)  ,  All 
i  n  t  (^r  vt.m  t.  i  on  rlocisions  arc   precoecled  bv  selection.  Althoiiqh 
5;ol(r'tiori    i^^cirnon.s  ,\x     '<\   or  or  c^nu  i  s  i  to   to  intervontion 
(iccir^ions,    it    is    i:n  [:)or  t /in  t    to  maintain   tho  (]  i    t  i  net  i  on  for 
t  VA)    r(  .isons.      I'irst  ,    t^ic    infor:nation   nocor>sarv  to  validate 
the  docir,ions   are  different.     When  making  dccisioi  j  regardinq 
intervention    it    is  necessary  to   know   if   the  desired  outcome 
of   ttio   intervention  can  be  predicted   from  the   test.  In 
SL^lcotion  docisions   it    is  only  necessary  to  nsc^   tests  that 
\;ill   predict  v;ho  will   or   won't   succeed   v;ithout  intervention. 
Second,    the  distinction   is  crucial   sinc>^  different   tests  can 
be  valir]   for  making  different  decisions.     A  test  that  is 
valid   for  Predicting   future  perfornnance  on   sor.e  criterion  may 
ht*  of   no  value   in   predict  3  no   success  of   an  intervention 
desiqnvd   f  t  o-^i    its  use.     This   point   is  clarified   oy  closely 
examining   the   selection  and    intervention  decisions  involved 
in  nnakinq   an   educational    nlacement   in   a  class   for  the 
mental  1  y  h  =^nd  i capped  . 

Somev/nat  different  from  emoloyinent  selection  decisions 
which  focus  on  a  test  or  assessment  oattery  able  to  predict 
future  Performance^   the  selection  nrocesrs   in  special 
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,.,h,.  ,H  1..-.    mv.-l-,-,    t  he    w...   ol    .l.  riM.,    t  n     -  oir  t  r  n.   I  ■   .  t'..t 

only   do   w.'   h,.v..    to    pM^lK-t    I  ut  uM.   p.M  tonn    --n        .  .  it.'i  i 

l.ut    we   .il'.n   h,wc   to  cxpl.iin   wl)V   ttu-    in.iiviHu.U  r^-rfofn. 
CM-rt.nn   w.iys.      Asscssnu-nt  data   mplt-yr-ci   to  mak'  d . -c  i  s  i  o  ri.s  for 

t  h.  ■  1        (   1  on  I  o  1     til.-    1)1       ■•,>:nrii  !     l".        ■  > ■  ■  ■  '  "  .  .  ■ 

I,,.  to   predict    wt.o   will    sucrord    ciri    wtio   will  not 

.<;nr-cMVM.d,    nnd,    in   addition,    t.;  1  1    why   those   5,olortcd  didn't 
suc-c...-d.     Consenuontl  y  because  of   tho  diaqnostic 
r<-r,ui  ronuTit  s  ,    t.^r.L.s  Tiust   d.-irons  t  ra  to  nood   construrt  validity 
..s   vvvll    ,..s  validity    la  F^ro.i  i  c  t  i  nq   critori,,   rolovaat   to  the 

■  i  t     1    1  on  'T', i  ko r  . 

It    ir.  at    this   Doint    th-t    the   influonre  of  considerations 
other   than   those  derived   from   test  data    imr^act  on  the 
doc  i  5",  ions   to  brinq   about  additional   desired   outcon^es  ,   such  as 
proportional   representation.      Once  the  decision   is  made  to 
select,   then  an    intervention  needs  to  be  decided  on.      In  our 
present  exainple   this  decision  usually   invblves  placement   in  a 
class   for  "the  mentally  nandicanped.      In  addition,  by  law,  the 
intervention  decision  must  be  more  specific  than  lust 
olaccment   and    include  the  snec i f i ca t i on  of  objectives  and 
instructional   strategies  to  reach  the  objectives.  The 
assessment  data   -mployed   to  make  such   intervention  decisions 
must  qive.the  decision-maker   the  ability  to  predict  the 
success  of   the   intervention.     This   test   is  usually  different 
from  that^use^^r   the  selection  decision.     For  example, 
intelliqonce  testis  provide  the  dec  i  p  i  on-maker  with  the 
ability  to  predict  to  a  moderate  degree  future  academic 
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exMl  :ri^it:ion    for   why   sik;cgss    is   or    isn't,   or  oH  i  c  t    i  .  Hov;^'V^■l, 
tiiorc-   is   no  empirical   evidence   to    indicate   that  employincj 
data  dL?rived   from  an   intelligence   test   is  of  value   in  either 
predicting   the  success  of  placennent    in  an   FMR  class  or 
designing   specific   intervention   strategies.      Thus,    from  a 
b roadonpd  conceot i on  of   validity,    under   such  circumstances, 
an   intelligence  test   is  both  valid   and    invalid?    valid  in 
selection  and    invalid   for  making   intervention  decisions. 

The   remainder  of   this  chapter focuses  on  that  literature 
which  addresses  bias   in  outconr;es   for  each  of   the   two  classes 
of  decisions,    selection  and  intervention. 
Selection  P>  i  a  s 

The  maior   contribution   to   the  area   of    selection  bias 
comes   from  those  v/ho  have  studied   the  various  models  that  can 
be  used   in   selection   that   take   into  account   the  social  value 
considerations   that   v;o  spoke  of  above.      This   literature  has 
mostly  addressed  decisions   in  the  employment  and  admissions 
area   and   has  been  discussed  under   the  headings  of   fairness  in 
selection  and  bias   in  selection. 

Since  the  models  discussed   in  this   literature  eminate 
from  considerations  of   the  abstract  concent  of   fairness  as  it 
relates   to   the   selection  of  minority  and  nonminority 
aoolicants,   there  is  no  research  available  to  tell   us  which 
model    is  better   than  another.     There   is  no  model   that   is  more 
fair  or   less   fair   than  another.     With   resoect   to   the  concent 
of  validity,   the  various  models  can  either  add  validity  to 
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tiu:-  decisions   ::ro..        i  rr  i  1  a  r  1  v  .    vnlh     ospoc:    to  bias,  tliG 
molols  are  noiibiased    if    they   result   in   the  desired  outcomes 
v;ith   reqard   to  qrouo  membershio.      The  choice   of;    the  desired 
o  ut  corne   is   a  value?  i  ud  qment   and   an   issue   of   fairness;  the 
issue   of   whether   or   not    the  model   yields   the  outcome  is 
within   the  nurview  of  a   broadened  concent  of   validity  and  is 
an    1 ssue  of   bias . 

Since  all    the  models  are  bp     d   on  various  notions  of 
fairness,    their   evolution   is   b..         on  various   philosophies  of 
fairness.     Three  ohilosophies  of   fairness   in  selection 
important   to  our   present  discussions  have  been   identified  by 
Huntt^r   and   Schmidt    (1^7^):    IJnoualified  individualism, 
(lu.iiified    1  nd  i  V  i(]  ua  1  i  sm  ,   and  quotas. 

Unnua  1  i  f i ed  .  l nd  i  v  id  ua 1 i  sm .      This  ohilosophy  maintains  that 
any  predictor  variable  or   set  of  predictor  variables, 
reqardless  of   the  nature  of   the  variable,   should  be  used  to 
predict  a  criterion   if   it    improves  prediction.  Predictors 
may   in   lude  such  data  as   test  scores,  demographic  information 
reqardinq   the   individual's   race,   reliqion  or  socioeconomic 
status,'  and  bioloqical    information   includina   sex  and 
handicaopinq  conditions.      The  one  stipulation    is  that  the 
information  used  must   increase  orediction  of   the  desired 
outcomes  and  desired  outcomes  are  only  those  that  relate  to 
performance  on  a  criterion.     So   for  example,   an  employer 
adopting  such  a  philosoohy  would  design  an  employment  battery 
to  collect  data  on  all   those  variables  that  are  going   to  help 
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on   Ihp   -ioh.      The   nractical    utility  of    inclunino  nrp^lictors 
that   add  minimally   to   the   orodiction  are   left   to  the 
discretion  of   the  ennployer.     Group  mombershio  only  comes  /'into 
play   if   such  rr.eniber  sh  i  p  helps   improve   the  prediction   for  that 
inriividua].      If   oredictors   function  differently  in   their  , 
relation   to  the  criterion  o.r  criteria  across  various  groups, 
then  tho  best  reqression  equation   for  each  qrouP  is  used. 
Incirviduals  v/ho  score  on   the^pred  ictor  ( s)    at  a   level  that 
would  allow  orediction  at  a  minimally  acceptable   level   on  the 
criterion   arc  accepted,   while   those  who  don't  are  not.  If 
only    1   few  annlicants   can  he   selected,   then   the'  annl ican ts 
are   seK^cted    from  the   too  down   from  a   list   rankinq  the 
ciopl  icants   in   terms  of  their   predicted  oerformance  without 
regard   to  qroup  membership. 

There  arc   several   advantc^qes   to   such  a  ohilosoohy.  The 
most   obvious   advantage   is   that   it  guarantees  the  selection  of 
only  those  who  have  the  best  predicted  chance  of  succeeding 
on   the^cr i ter ion .     Group  membership  does  not  enter   into  the 
de^^ion  once  the  aool  icants  are  ranked,   thus  avoiding  a 
situation  where  one  aoplicant   is  chosen  over   another  soley 
becciu:^e  of  nrouo  membershin  and   not  merit    (as  defined  by 
oerformance  on   the  criterion  or  criteria).      It   is^  argued  by 
those  who  ad voca te.  such  a  ohilosophy  that   this  gives  members 
of  all  groups  an  equal  chance  of  success  that  would   not  be 
predicted   if  soecial   advantage  was  given  'to  one  qroop  over 
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1)  n  i  I  C'  so ! );  i  y    t  •.■  no  . ; 


iorenctjs  in 


.1  cross  a  r  o 


closoly  approximating  eau^lity   in   future  chances  of  promotion 
or  qraduation  across  qrouo^?!^,   and   reducinq  qroup  differences 
in    failure  or  frustration.^ 


troublesome  lies   in  those   instances  where  sinqle-group  or 


i  rcu'Tistances  ^   equally  caoable   individuals   from  the  qroup  for 
which  t'nere  are  no  valid   predictors,   or    less  valid  than 
predictors   for   other  groups,   will   have  less  chance  of  being 
solc^c-ted.     Under  extreno  conditions  where  a  battery  has 
little  or   no  nrodictive  utility   for  a  group  whose  mean 
pt^r  f  o  nnance   falls  below  the  acceptable  cut-off,   no  members  of 
that  group  will   be  selected   even   if  they  are  capable. 
Conversely,    if   the  nnean  performance  of  a  group  on  a  battery 
that  has  no  predictive  utility  for   the  criterion  or  criteria 
of  concern  falls  above  the  cut- o  f  f  ^'^a  1 1  members  of     the  qrpuP 
will   be  accepted. 

Qualified  Individualism .     This  philosophy  is  very 
siofilar   to  unqualified   individualism  except   in  one  majoY 
respect.     Those  who  hold   a  ohilosophy  of  Qualified 
individualism  advocate   the  use  of   the  best   predictor   or  set 
of  predictors  exceot   those   that  specifically  identify  an 
individual's  group  membership.     VJhen  there   is  no  systenatic 
error   (i.e.,  bias)    in  the  predictors,   then   there  would  be  no 


The  main  obiection  to   those  who   find   such  a  philosohy 
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wuu.j   a-iM   VH.zwivx]    i.o    1  rio   vi  ^-^l  i  ct.  i  on  .      \U)VU^vcr  ,    li    tests  cir.- 
bicised  atvl   tnr^rehy  orrdict  differently  across  qrouDS  ,  the 
addition  of  qrouo  membership  as  a  predictor  may   increase  the 
test's  utility,  but  would  bt-  objectionable  to   those  who  hold 
this  philosophy.     The  major   reason   for   the  objection  lies  in 
the  fact   that  such  predictors  are  viev/ed  as  only  beinq 
correlates  to  psycholog icall y  meaning ful  variables.  Thus, 
tney  are  considered   a   "stand-in"    for   the  substantive 
psycholoq ical  differences  that  exist   among  people.      It  is 
arc^ued   by   those  who  hold   a   philosophy  of  qualified 
individualism,   that  a  variable  such  as  race   is  only  related 
to  tt^e  criterion  or   set  of  criteria   in   an  obscure  way. a  its 
valu-  as  an  explanation   is   remote  at  best  and   its  use 
provides  an  easy  vehicle   for  being   lax   in   the  quest  for 
psychologically  meaningful  predictors. 

Since  grouo  membership  can  not  enter   into  the  selection 
procedure    in  any  way,   then  no  adjustments   to  a  biased  test 
can  be'  made.     Additional  predictors  can  be  employed,  but  if 
this   IS  done,    it  would  have  to  be  done   for  all   since  group 
membership  can  not  be   identified.     This  qualification  would 
also  rule  out   the  use  of  different  tests   for  different  groups 
because ,   again ,   this  v/ould   require  treating  qcouns 
differently  soley  as  a  consequence  of  a   factor    (i.e.,  race) 
thatv^has  no   instrinsic  psychological  relevance. 

The  advantages  and  disadvantqes  of  this  approach  are 
similar  to-  the  philosophy  of  uriqualified   individualism  with 
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tho  additional  d  i    ad  v  a  n  t  aqc-   that   since         cannot  iis€~: 
"stand-in"   predictors,   there  is  note  of  a  chance  that 
assessment  batteries  will   show  differential  validity  across 
groups.     On  the  oositive  side,   the  assessment  practices 
evolving   from  a  philosoohy  of  qualified   individualism  can 
never   be  criticized   for  overtly  makinq  qrouo  membership  a 
feature  of   the  process.  ^ 

y 

Quotas  .     A  quota  nhilosophy  is  one  in  wh  icH'  max  im  j  zi  m 
predictive  validity  is  seen  as  less  important   than  adjustinq 
cut-off  scores  to  favor  one  or  more  qroups.     !^uch  adjustments 
\-»uld  allow  a   lower  predicted  criterion  score  for  some  groups 
and  not   for  others.     These   adjustnnents  to  cut-off  scores  can 
be  made   for  a  variety  of   reasons.      The  various  selection 
models  dosiqned   to   reflect   this  nhilosophy  embrace  a  variety 
of  values  regarding  fairness.     All,  however,  disagree  that 
selection  based  solely  on  predicting   the  same  criteria 
cut-off  score  for  all  grouDs   is  the  fairest  of  procedures. 

Two  types  of  selection  models  have  been  proposed  under 
the  quota  philosophy.     The  first  type 'argues  that  since  the 
ultimate  ooal  of  decision  makinq   is  to  choose  those  who  will 
succeed,   the  best  way   to  set   the  cut-off  score  on  the 
criterion  is  to  base  it  on   the  potential  success  rate  of 
different  groups  and  not  on  what  a  validity  study  predicts 
performance  will  be  c^the  criterion  measure Thus ,  th  i  s 
typo  adjusts  for  what  is  perceived  to        unfairness  when 
imperfect  tests  are  employed.     The  second   type  quota' 
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S(  ^  ]  i'^- 1  i  r)n   moflc'l     :Jiur>t:r   t  h.^*   Ir^^'^l    of   prodir-tod  critorion 

imt:)ortaiK      of    hrv/inq  irorc^  o  1:   ono  qrouo  solGcted.      It  is 
regarded  by   its  proponents  as  being   fairer   in  that  it 
requlntes  circumstances  in  a  v;av  that   is  believed   to  bring 
about  "social  good."      In  either  of  the  two  cases  some 
aopl ican ts'  are  selected  that  one  would  oredict  lower 
criterion  performance  than  some  applicants  who  are 
reg  i  s ter ed  . 
Selection  .Models 

Several    selection  models  have  been  pronosed  that  reflect 
the  various  philosoohies  described  above.     Those  that  reflect 
the  ohilosoohies  of  unqualified    individualism  and  quotas  can 
be  ernplcypd   with  either  biased  or   unbiased  tests.      In  the, 
forTier/    the  different  oredictive  utilities  are  corrected  in 
the  prediction  equations   for   the  various  groups   for  which  it 
has  differential  validity.     Once  accomplished,   the  same 
criterion  performance  level    is  used   to  determine  the 
differing  cut-off  points  on   the  predictor:   test(s)    to  use  for 
each  group.      In  the  quota  models',   no  adjustments  are  needed 
when  biased   tests  are  enoloyed  since  a  defined   number  of 
individuals   from  each  group  will  be  selected  •     Concern  is 
geared   tov;ard   tto  best   ranking  within  qrouns  .      In   the  auota 
system  differing  cut-off   points  ar;.*  choSen  to  both  accomodate 
the  bias   in   the  test  as  well   as  provide  the  advantages  to 
those  groups  as  is  deemed  fair  by  the  decision-makers.  In 
most  cases,    it  would  be  inaporopr i ate  to  employ  biased  tests 


239 


Assessment  Bias  • 
236 

v..,,,,i  o!)f  ho  1  rl  r-  :i   nhi  lor.ophy  of  n-alifio^l   inrliv  idunl  ism  sinro 
;,„y    Kir^^^l'-nl-  '-i^i-    to   t  in-   ';--:.r.  s   would    i.ioMt.,i  :  n  umui^ 
r.,ciib..-rs'n  io  .     Tho  only  tirr.o   it  would  bo  aonr  opr  ^  .1  t.p   is  if 
tests  that  wore  biased   for  one  qroup  were   included  with  other 
t.^sts  that  were  biased   for  other  groups   in  such  a  way  that 
the  biases  balanced  out.     Thus,  everyone  would  take  all  tests 
and   no  one  would  be   identified  by  q  roup' member  sh  in . 

In  the  remainder  of  this  section,   thos.e  selection  models 
that  have  been  most  widely  debated   in   the   literature  will  be 
briefly  described.      It  is  not  our  purpose  to  provide  detailed 
information  about  each  model.     For   the  reader  wishing  to 
employ  one  of  the  various  models,   we  refer  him/her   to  the 
references  cited   in  this  section.     Thus,   it   is  our  puroose  to 

* 

provide  an  overview  of  the  models  and  to  classify  them  such 
that   the  reader  may  1)    become  familiar  with   the  various 
models  that  have  been  prooosed  and   the  philosophies  governing 
their  use,   and  2)   preview  the  more  prominent  models  so  the 
reader  may  decide  on  which  one(s)   he/she  may  wish  to 
investigate  further. 

Eoual  .Risk  .Regression-Model.     This  model,  named  by 
Jensen  ■(19RP'),   is  the  simplest'o€  models.      It  employs  the 
same  reoces.sion  line  in  oredicting  criterion  performance  for 
all  groups  and  the  same  criterial  cut-off  score  is  used  for 
all  groups   to  determine  who  gets  selected  and  who  doesn't. 
Since  the  same  regression  line  is  used,   this  model  is  only 
employod  with  unbiased  tests.     This  model  of  selection  is 
acceptable  to  those  who  hold  unqualified  and  qualified 
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(1971),    this  model    first  sets  the  minimum  acceptable  criLecia 
performance  and   then  chooses   those   i nd i v id ua 1 s  ba sod  on  the 
maximum  deqree  of  risk  of  sel-ction  error  decision  makers  are 
willing   to  tolerate.     This  ri  .k  factor   is  the  same  reqardless 
of  group  membership.     For  each   individual   the  best  predictor 
6r  set  of  nredictors   is  employed.     Any  test  or  set  of  tests 
may  be^sed  and  different   tests  for  different  groups  are. 
acceptable.     Fmploying  the  best  Dredictor(s)    for  an 
individual,  criterion  performance  is  predicted  and  with  the 
aid  of  a  normal   curve,   the  apolicanfs  risk  of  failure  is 
computed.      Individuals,    regardless  of  grouo  membershic,  are 
selected   if  their   risk   is    less  than  set  as  maximally 
acceptable.   "I'his  model  can  !-e  used   for   tests   that  are  biased 
in  slooo,    interceot  or   staiviaird  error  of  estimate,   and  is 
acceptable  only  to  those  who  hold  a  philosophy  of  unqualified 
individualism.     Since. i't   identiTies  group  membership  in  its 
selection  of  predictors,    it   is  not  an  acceptable  model   to  the 
qualified   individualist.     Those  who  hold   a  quota  philosophy 
of   fairness  likewise  find   the  model  unacceptable  in  that  the- 
same  criterion  performance  and  conseauent  minimum  acceptable- 
risk  are  sot   the  same  for  all   individuals  regardless  of  group 
member  shin. 

Regression  t^odgj^    This  model,  proposed  by  Cleary 
(1968),   requires  that  for  a   test  to  be  used,   it  must  have  the 
same  slope  and   interceot  for  all  groups.     Once  employed,  the 
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v,n<.]    tho   Fnual    Pi  r>k   Reqrossion   '^odol    is    that   there   is  no 
requirement   that   the  test  have  equal   S across  groups. 
As   a  conseauence,   the  risk  of   failure  can  vary  across  groups 
if   the  SF     is  different   for  different  groups.     This   is  the 

y  • 

case  even   though  the  prediction  of  criterion  Derformance  is 
the  SaTie  across  groups  as  the  consequences  of  the  equal 
slopes  and   interceots.  reauirement.     Since   identification  of 
group  membership   is  not  necessary   in  selection,   the  model   is  . 
apnronriate  for   those  who  hold   a  philosoohy  of  either 
unqoalifier]  or  qualified   individualism.     The  use  of   the  same 
criteron  cut-off,   reqar^less  of  qrouo  membershio,  denies  its 
acceptability   for   those  who  hold  a  quota  philosophy. 

Multiple  Regression  Model .     Proposed  by  McNemar  (1975, 
1976),   this  model    is  most  closely  aligned  with  a  philosophy 
.  of  unqualified   individualism.     From   the  view  point  of 
unqualified   individualism,    it   is  the  most  statistically 
sophisticated  and  appealing  of  all  models  proposed   to  date 
(Jensen  ,   1980)  .     The  puroose  of  the  model   is   to  make  us-  of 
the  best  possibl e  set  of  pr ed  ictors   in   the  selection  process . 
Group  membership  is  used   to  statistically  adjust   for  bias  in 
predicting   the  criterion  when   systematic  error   is  evidenced' 
in   the  prediction.     Consequently,   it  does  not   fulfill  the 
requirements  of  Qualified  individualism. 

Once  the  best  predictions  are  made,   the  user  of  this 
model  maximizes  the  average  level  of  performance  of  the 
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solertoos  by   rankinq   then   in  orrler   of   nredirt.ecl  oer  f ormanco 

:;r    ijiu  :  .    111'./  r,       n-v:^^^r   o;    in'jvr:   .ii:;^  ^I'is  r^-vii 

selected.      This   is  done  without  consideration  of  qrouP 
mombershit    and   is  consequently  not  an  acceptable  model  for 
those  who  hold  a  quota  philosophyc 

Proportional  Representation  .Model ,     Simply  stated,  this 
model    holds  as  a  requirement  that  the  proportion  of  selectees 
defined  by  a  criterion,   such  as  race,   be  set  equal   to  some 
preestablished  proportion  such  as  that  reoresented   in  the 
United   States  Dopulation.     Individuals  are  ranked  w it-bin  each 
group  and  the  number  of   individuals  chosen   from  each  group  is 
accomplished  by  selecting   from   the  top  down.     This  model, 
therefore,   employs  different  test  cutoff   scores  for  different 
groups  wi  :h  consequent  differences   in  predicted  criterion 
performance  across  groups.      As   such,    it  can  be  classed  as  a 
quota  model    and   is  unacceptable  to   those  v;ho  hold 
philosophies  of  unqualified  or  qualified  individualism. 

Culture-^  Modified  .Criterion  .Model .     This  model,  proposed 
by   Darlington   (1971),   explicitly   identifies  the  decrease  in 
minimally  predicted  criterion  performance  that 
decision-makers  are  willing   to  accent  when  selecting  minority 
individuals.     This  is,    in  practice,   accomplished  by  reducing 
the  prediction  of  the  criterion  score  of  the  nonminority 
group  members  to  equate  them  with  the  predicted  score  of  the 
minority  group  members.     Such  a  practice,   then,  builds  a 
desired  "bias"   into  the  test  by  changing   the   intercept  by  a 
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procle  torm  i  ned  constant.      /no  donr^r-  of  biis  built   into  the 

performance   woi'.v.   .nedict   a   qrcater   probability  of  failure 
than  others  ^-.'iio  are  rejected.     By  usinq   this  model,  the 
decision  maker    is   forced   to  ma ke.  ex  pi i c i t    in   the  selection 
foriT.ula   all   considerations   reqardless  of  whether  or   not  the 
considerations  were   to   redress  past   injustices  or  to 
comPc^nsate   for  oerceived  biases   in   assessment  practices  yet 
to  be   identified.     When  employinq  biased  tests,  differences 
in  cut-off  scores  v/ould   not  only  account  for  differences  when 

*-inc|   for  bias  but  also  adjust   for  "other" 
considerations.     The  results  of  employinq   this  model 
satisfi.es  a  quota  philosophy  since   it  accepts  different 
levels  of   oredicted  oerformance  across  qroups.     For  the  same 
reason,    it   is  unacceptable   to   those  holdinq   unqualified  or 
qualified   individualism   in  philosophies. 

Constant  -  Rat  io.  Model .      Proposed  by  Thorndike   (1971)  r 
this  model   arques  that  when   the  mean  difference  between  qroun 
scores  on  the  predictor  test(s)    are  greater  than  the  mean 
difference  between  qroup  scores  on  th^  criterion  test 
unfairness  occurs.     Since  the  correlation  between  the  test 
and   the  criterion  is   imperfect,   there   is  the  possibility  that 
the  above  will  occur.     vn^en   it  does,   the  cut-^off  point  used 
for   the   pr  ed  ictor   test  ma  y  excl ud  e  from  select  i  on  some  of  the 
low  scorinq  qrouo  who  \^uld  be  expected   t©_ pass  if  previous 
group  performance  on  the  criterion  were  used  to  predict 
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future  o^^r  f  ormnnco .   As  a  consequence,   Thorndike  suqgests  that 

^  • ,      t  f ,  1  ,r .  r  !  ]  <  • !   f  ^    "I  }  'H;  r  i  ^  V    'i  I  ^    !^^.'  ^r'^.  1  n  o  r  i  t  V  n  r  o  no    j  n"3  i  v  i  rui . i  1 

th.u    \sx.:^i     i   succe^Hl   on   t.lie  en  iter  ^^^n   i  f  q  iv   n   a  chance.  For 
oxamplr-,    if   ?0  norcent  of   the  minority  qroup  ir^Gmb-rs  and 
percent  of   the  niajority  group  members  succeed   on  the 
criterion,    then   the  cut-off   score  on   the  predictor  test 
should  be   set  so  30  percent  of  the  minority  group  members  and 
40  percent  of  the  nonminority  qrouo  members  are  selected. 
Since   this  model  varies  the  acceptable  criterion   level  across 
grouns,    it   is  a  quota  model.     Yet,    it  differs   from  those 
quota  models  mentioned  above   in   that   the  adjustments  are  mad- 
as  a  consequence  of  ootential   unfairness  due  to  imperfect 
predictors,   not  based  on  ethical   or  moral  values  concerning 
the  ultiT.ate  **good"   of   the   selection  process. 

Condi  tional -P  robabi 1 i  ty^Model .      Similar   to  Thornd^Ve's 
model,    this  model,  described  by  Cole   (1^73)    is  a  quota  model 
based  on  a  belief   in   fairness  stemming   from  problems  evolving 
out  of  the  use  of   imperfect  tests.     Cole   (1973)   argues  that 
there  should  be  the  same  probability  of  selecting  minority 
and  nonminority  group  members  as  defined  by  each  qroup' s 
probability  of  achieving  satisfactorily  on  the  criterion.  As 
the  nam^^   implies,    it  differs   from  Thorndike' s  model    in  its 
use  of  conditional   probabilities  rather   than  constant  ratios. 
Howwor,    its   intent    is   the  same. 

Kqual  , P.  robabi  1  i  ty  .Model  .     This  model,  proposed  by  r.inn 
(1973)    and  named  by  Petersen  and   Novick   (1976)    is  a  quota 
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mo-i-l    (losinn'^rl   to  eour-ito  ch.inr-os  of  success   across  arouns. 


This   r-quiros   that    the     redicr.or    tost  cut-off   scores  be  set 
so  that   the  sanne  proportion  of  minority  nroup  members  will  he 
selected   who  are  predicted   to   succeed  as  nonminority  groun 
members.     Under  circumstances  where  there   is  a  larqe 
discreoency  between  the  means  of  the  minority  and  nonminority 
qroups  on   the  oredictor   test  with  the  minority  mean  below  the 
nonminority  mean  and  a  hiqh-cut  off  on  the  criterion,  it 
v;ould   be  necessary  under   this  model    to  deny  selection  of  some 
of  the  best  nonminority  applicants  so  that  the  proper 
proportions  can  be  maintained. 

Probability  .Weighted  .Model .     This  model  v;as  first" 
described   by  Berieter    (1975)   and  oives  everyone  some  chance 
of  beinn   selected.      Flowever,   the  orobability  of   their  test 
being   selected   is  defined  by  their   orobability  of  success  as 
indicated  on   the  predictor   test.      Under  all  of   the  other 
models  discussed,   there   is  a  nroportion  of   indiv"iduals  v;hose 
performance  would  not  allow  them  to  be  considered  for 
selection.     Berieter  argues  that  because  of  the  imperfect 
nature  of   tests  e-en   low  scoring   individuals  have  some  chance 
of  succeeding.     Consequen tl y r   those   individuals  also  should 
be  considered,   regardless  of  how  small   the  chances  are  for 
selection.     Making   use  of  the  cut-off  score, ^the  nredicted 
score,   and   the  SF^,  one  can  calculate  the  Percent  chance  an 
individual  has  of  succeeding.     If  one  individual  has  png 
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latter    individual    shoult^   bo  fi  ivon   n   ""U^   times  qr^.^att.^r  chance 
of  beinq   selected   than    the   former    individual.  The 

nh.-ib  i  1  }  ty         i  a  h  t  ' -d    '^^-il-d     i  ^;   a    euot:.-i    -c^c        "  .inr^'    it-  sf^f^as 
from  oacii  nrouo  a   nrooortion   that   ec]ual5>   the^  proportion  that 
would   exceed   the  criterion  rut--o^f"    (J'-nsen,    1^H(S   P.  407). 
In   tnat   senso    it    is  similar   to   the   Constant    Patio  yodel. 
Hov;over,    v;h  j  1  c   the   Constant    Ratio   ^^odel    selects  persons 
v;it:iir^  qrouos    to   max  im  i  '/e    t^i"  criterion        r  f  o  r'ti  a  nr  t-  of  tho:;(; 
selected,    the   random   selection    procedure   of    the  Probability 
Weiohted   -odei    does   not   result    in   such  maximization.  JnstOcad 
it   allows   for    some  who   ha-  •  a. lower  chance   of   succe-dinq  on 
t\v:   criterion    to   be  select*-!. 

Kxpected   [It  il  it  ies.  Model  .      The   applications   of  this 
model,    first   nrooosed   for  decision  making    in   economics  by  von 
^:eumann   an>d   Mo  r  q  :-?n  st  e  r  n    (19^4)    and   aqain   by  Wa  Id    (lO')^),  has 
recently  been   explicated   for   use  as  a   selection  model  by 
Gross  and  (1^75),    Petersen    (197S),    and   Petersen   and  Novick 

(1976).      This   model    is   hiqhly  recommended  by   its  proponents 
since    it   can  be   adopted   by   all   decision  makers,    reqardless  of 
their    fairness  philosophy.      The  model    forces   the  decision 
makers   to  decide  explicitly  what  considerations,    if   any,  thc?v 
v;ish   to    include    in   the   t)rocess.      Then,   weiqhts  are  niven  to 
the  desi rabi 1 2 ty  or   utility  of   the  various   possible  outcomes 
in   such  a   v;ay  as   to  rru?x  iir.i  r:e   the  utility  of   the  selections 
niarle.     Vvhen  v;e*  consider   the   fact   that  we  are  orodictinq 
oerformancG  on  a  criterion  with  less  than,  perfect  tests 
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(i.-    .;    t.-sts   v;  1  t       lit:    tJ»'rf'(^'-t    1 1 '  1  i  n  1 )  i  1  I  t  '/   rirKl    v  ci  1  i  rl  j  f  y )  cm:i 
vi'.;v;   the..'  out:coTio5i   of  decisions  mnrl-  v/ith   these   tests  r\5  one 
of    four    tyt)''S.      Those    include   si  tua  t  i  on5i  where  an  individual 
]  :,    I  ;■       ■■■l*'f-t-.    ;     iiid    i)«»rffu-'nr;    a:;    nr-'dirt?M"l    (tru.-^  oosit.iv^d; 
(?)    5;.- ■  I      1 1  d   and  rloos   not   nerfotm   as   oredicted  (false 
[positive);        )    not   selected-  and   would   have   performed  as 
[:>  r ' -d  ]  c  t«  >d    (true  negative);    and    (4)    no  t   selected   and   would  not 
hav^    performed   as   predicted    (false   neqative) .      If  each  of 
tf^•f'•.'   out  is   'Mvtui    a   weinht,    then   the   outc^OTies   that  are 

dt'Sir-.d    for   each   qroup   testr-r^       i  ^   be  decided   beforehand   and  a 
f  or-rad construe  tei    to  me.  c   the-       en  .    ,      Weights   can  be 
assigned   to   the  desirab    ;      v  i:'v   of  each  of  these 

outf:o^T^i'S  and   a   s-.^lectic'    foi  'atrt'ted    that  would 

man  inula  to   the   nrobabili    i     .  lj-^^j   each    type.  For 

exa*nple,    if   one  wishes   to      .  e   a   quota    t  maintain 
proportional    representation  o'    ni  .n: i ty  and  nonminority 
individuals,   one  can  do   so   ;/  varyinq   the  weights   assigned  to 
the  outcomes   across  groups.      If   proportional  representation 
is  d'^s     c^d  ,      hen  one  may  wish    to  qive  added   weinht   to  the 
outc.^ome    a     >e]ectinq  minority   individuals   for  wl.om  one  may 
not:    oredict   success   hut   may  succeed  given   the  chance.  Ry 
manipulating   this  number,    proportional   r  epr  e  seV^i  t  a  t  i  on  can  be 
assured . 

As  mentioned  above,  all  nhilosoohies  of  fairness  can  be 
satisfied  through  using  this  model.  In  addition,  all  models 
discussed  in  t\}is  section  can  be  viewed  as  a  derivative  of  ' 
this  model    since  all   require,   either    implicitly  or 
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As  s{' v',r>m  cr)  t    f^i  i\r> 

K-xn\  \<- \  \  ]  y  ,    t:hp    a  s  s  i  q  run  t  ■  [  1 1    of    MfilUi^'S.      Tn  ruiilifn'^l 
utiqviril  1  f  individualism  the   utilitii-^s  across  qroups  are  the 

sumo.      F'lacb  of   the  nuota  models  var  ios  •  ut  i  1  i  ty  across  qrouos 
to   effLCt    f.nrnc'S.s    .      For   c>xomp]o,    t-hn   Conntint    Ratio  Model, 
whun  concc^ivi^d    from   tho   above  persnoctive,    requires   that  the 
sum  of    t no   true   positi-e  and    false   positive  exoected 
utilities  divided   by  the   sum   of   the   true  positive  and  false 
positive  exoected   utilities  be   the   same  across  groups.  The 
i.»xn( t        utilitiiss  are   the   assiqned   weinhts  or   utilities  for 
eacii  outcome  multipled   by   the   conditional    probabilites  of 
each,  outrome  summed   over   all    applicants.     Likev;ise,  the 
Conditional    Probability   Model,   when   state^d    in   terms'  of  the 
F-xpoctf^d   Utilities  Model,    requires   that   the  expected 
utilitie.s  of   the   true  positive  divided  by  the  sum  of   the  true 
positive  and   false  negative  exoected   utilities  be   the  same 
across  groups. 

Fa  1 rness  -or  n i as .     When  writing   about   the  various  models 
and    their   use   in   the  selection   process,    some  authors  have 
referred    to    it   as   an    issue  of   bias  while  others  have  referred 
to   It   as   an    issue  of   fairness.      For  our   present  purpose,  we 
employ  the   use  of   both   terms   in  differentiating  between  the 
tv/o.      In  our   review   we  have  not*=-d   that  bias  can  be  equated 
with   t'r^^  concent   of  validity  vhen  validity   is  conceived  in 
its  broadest  sense.     We  also    implied   that   there  are  tv;o 
classes  of  validity,  construct  validity  and  outcome  validity, 
and  consequently  two   forms'  of'*  bi  as  ,   construct  bias  and 
outcome  bias.     Outcome  validity   is   that   tyne  validity  which 
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?AG 

prfjvii]":;    i  u  I  o  r 'n ,  1 1  loi)  tlv^  utility  of   a    t:ost     in   or  (^(  I  i  c  t.  i  rvi 

.dcsir»:.'()  ouLco:nos.      Outcono  biar,  ,    then,    rolates   to   tests  that 
predict  dcsirod  outconnos  differontly  across  qrouos.  The 
choif"  of    ru'lr-r-tion  n^.odtO      briefly  dosrrib^v.l   above   is   not  an 
issiu^  of   Vcilidity,      The  model    choser^   for   use    in   any  selection 
process   is  based   on   one* s   philosophy  of   fairness.      There  is 
no  one  ohilosoohy   that    is  more  valid   than  another.  However, 
whether   or   not   the  model   when    implemented   results    in  the 
de:;ir'-^fi   out^'ome    (c^.n,,    nr  o^no  r  t  i  ona  ]    representation)    is  an 
is.-aie  of   outcome  validity.      If,    when    implemented,  a 
sy::^tematic  error   results,    then   the  use  of   the  model    is  . 
biased.      From   this  perspective  a   fair  model  may  be  biased  in 
the   sense,   and   only   in   the   sense,    that   its  implementation 
does   not   yield   the  outcomes  as   predicted   from   the  model. 

L^mpirical  -Hludies  -in  Selection  Bias.     A   few  studies  have 
been  conducted    that  qualify  under   our  present 
conceptualization  of  bias   in   selection  since   they  focus  on 
the  validity  of   the   test   with   resoect   to   selection  outcomes 
as  or;>posed    to   the  validity  of   the   test    in  dem  ons  t  r  a  t  i  nq    it  to 
be   an   effective  measure  of   a   construct.      As  mentioned 
f)reviously,   (^^xtennal   criteria   used    in  many  predictive 
validity  studies   are  chosen   to  demonstrate   that   the   test,  or 
a  measure  of   a   construct,    is  actinq   as   the  construct   it  is 
supposed   to  measure.      While   these   studies  provide  the 
decision  maker   some   information  relevant   to   the  situation  in 
which   they   intend   to   use   the   test,    the  possibility  exists 
that   in  certain  situations  the  validity  of  the  criteria  falls 
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shnr;    fOr    :-.t>.  •(■  i  f  i  (•   .  ](,>r  i  r,  i  o  n  rnakinq.      'I'h"   oxr^ntion   \o  this 
qetier.ll    ruli"   Wry,   in   ttie   employrnont   testinq    literature.  In- 
this   literature  r r i ter i o n- r o 1  a  ted  validity  researchers  often 
choose  eriteri.i  whose   face  validity   is  qoite  hiqh  for 
decision  rnakinq.      Howt-ver,   even    in   this    litc-rature  concern 
has  been   raised   that   the  general   use  of  cognitive  abilities 
tests   to  nredict   job  oe  r  f  o  r-nance  may  be   rcquirinq  inferences 
that    invalidate  their  carte  blanche  use  across  employment 
settinqs.      Specifically,   concern   has  been   voiced    that  the 
predictive  validity   information  on  certain  type  cognitive 
abiliti-^s   tests   in   nredictina  certain   tynes  of  job 
performance  does  not   warrant   the  generalized  use  of  all 
cognitive  abilities   for   tests   for  all   job   functioninq.  It 
may  be   that   the  validity  of  a   test    is   s i t ua t iona 1 1 y  specific 
Chisel ii    (196G),   after   observing  considerable  variability  in 
validity  coefficients  across   studies,   noted   this  concern. 
Schmidt,    flunter,   and   Urry   (1976)   examined   this  oossibility 
and   noted   that  tests   that  show  validity  as  offered    in  one 
situation  anneared   to  be   invalid   in  up  to  SO  percent  .of  the 
studies  employing    its  use   in   predicting   job  performance  in 
other   situations.     '  jwever,   a   recent   series  of  studies  have 
found   this    invalidity  to  be  a   statistical   artifact  mainly 
resulting   from  sampling   error,  differences  across   studies  in 
test  and   criterion   reliability,   and  differences    in  range 
restrictions   (Callender   d   Osburn,   19BM;   Lilienthal  S. 
Pearlman,    in  press;    Pearlman  et  al . ,   1930:  Schmidt, 
Gast-Rosenberq  S,   Hunter,    19pr^;    Schmidt  &   liunter,   1977  ; 
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Sch.iMdt,    flunl*^r        rnp\.M\,  )  .      In   nnotlior    study  condurtod 

by   Scheldt,    Ilunt.cM:   and    Poarlman    (1981),    thv   use  of   tests  that 
mcMSUte  various  connitive  abilities  were   found  valid  in 
predictincj    job  performance   across  a   family  of   five  different 
eleiictil    ix)sitions.      SuTiilar   linriincj.s   are   rer)ortc»(j   by  Hunter 
(  ]  9^0)    wJio   shov;ed   that   across   sno  cri  ter  ion-related   val  idi  ty 
stndif^5;   «^7ioloyinq   a  variety  of  criterion  measures   puported  la 
be  valid   across  a  variety  of   jobs,    the  validities  of  a 
c(;'iino,s  J  t* '  of   verbal    and  ouantitative  ability  measures  in 
pre'ii(-tann   class   c-f   jobs  cirouped   accordinq   to  their 
coinp]  ( 'X  i  t:  V  of   information  -    processinq   renu  i  remen  ts  ^  ranqed 
fron   .23  to    .Sf^.      flunter    (198^)    concludes   that   there  aooe^r^ 
to  be  validity   in   usinq    these   t coqnitive   tests  in 
predictinn   job  p,er  f  ormanc   >  veh   for   the  lowest   rkill  jobs. 
As   a   result  of    th^        stuc;  ies  reooi'linq  on  tVie 
qenera 1 i zabi 1 i tv  of   a  variety  of  coqnitive  abi 1 i t y  measures 
for   a  variety  of   job  families,   Schriidt  and   Hunter  (1981^ 
conclude  that,   v^i  th   respect   to  employment   testi'^q  '*our 
evidence  shows   that   the  validity  of  the  coqnitive  tests 
studied    is   neither   .specific   to   situations  nor   specific  to 
jobs"    (p.   11"^. '^y.     Another   conclusion  that  can  be  dvawn  from 
th':se  studjc}s   :s   tnat   since   they  are   unbias.''    in  predicting 
job  oer  f>- rnance  across  qrouos   in   taor>:  ^•♦*udie3,   there  should 
be  no  rer-son   to  question   their   unbiased   nature  when   the  tests 
art:  employed   in  a  qeneral   way  across  situations  and  jobs. 

In  the  application  of  tests   for  rDakinq  educational 
decisions  regarding  special  education  diaqncsis  and 
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,,],,;•...,;,  tit  ,    til.     v,il  Kilty    i'-.sn.  r.    -ir^'    '.o-ncwh.it    ri  i  f  f  .  ■  i  < -n  t  ,  'I'.-r.ts 
,/,,  1  .  1.,  t  ."i    li'i    I'MkiiKi    such   (Iccif.iotis   .irc  v  .\  1  i  ■  I,:  t  c<i  t.o 
(Ic'iKMist  rnic   ttv.'ir  utility   in  Ticnsurinrj   a  construct..      I'x  to  r  na  1 
cr  i  ter  i,-.   used    to  valicKntc?    [()  tc;U.s,    for   example,    nre  chost-n 
•  , ,   ,  |,       ,.|  .  t  1    1 1  .      t  h.  ■        ,  I  I  1  1  t  V    c  f     the    -est  ic.isnrc    o  f 

intcll  iqcncc.      Smilarly,    when    the   sanne  external   c-ritoria  are 
usel   to  clctcmine   if   bias  exists,   the  question  addressed 
relates   to  whether  or   not   the   test   is  differently  valid  in 
the  measurement:   of   the  construct   across  qrouns. 

Fxi-rnal    c-r:iteria   e-nployed    to  validate    TO  tests 
(usujlly   <i   standardized   measure  of   acade;nic   achievement)  are 
related   to  a  desired  outcome  of   the  selection   phase  of  the 
decMsion  ranking   nrocess   (i.e.,   choosing   those  who  will  not 
perform  without    intervention).      From  this   it   is   inferred  that 
t;,^   predictive  validity  studies   so  offered  provide  validity 
for    the  use  of   the   test    in  decision  makinq.      Similarly,  with 
respect   to  bias,   the  assumption   is  made  that   if   they  are 
unbiased    in  measurinq   the  construct,    they  are  unbiased  when 
they  are  used    in  decision  makinq.     However,   whether  or   not  an 
IQ  test   predicts   if   a,^hild   will   or   will   not  be  able  to 
perform  with  or   without   intervention  q^qually  v;e  1  1  across 
situations   for  culturally  different  children   is  a  question 
yet   to  be  answered.      Indeed   there  are  those   that  would  arque 
that  not  only  has   the  question  not, been  answered,   but  neither 
has   the  more  basic  Question:     "How  well  does  an  intelliqence 
test  predict   that  culturally  different  children  will  not 
perform  differently   if   they  are  not  selected   for  Dla.cemenl?" 
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It     is    t).)in1tMl    out    by   :.():iu^    tli.it    tins   nu.^stion   would    h<iv^*   to  bo 
.m';v;ci.'"i   l»y   usii)o   ciitnn   morr   ii>b'V,mt    to    t  lu-  c).mm?;ioii 
rn.ikiiJM   r)r()f'»»r;s    t  li.ui   s><"oi  I'S   oo   st:  rUuKu  d  i  X'^d   tests   of   .)c:<)Hf-m  i 
.u'h  ii'Vi^nuMU  .      Only  a   handful    of   studios  are   roport(Kl    in  the 
]  I  t  , -i  ,  I  t  iM  -  '    th.n    ,}llr*>';!;    \  \\  \  '\    ('(mv'^tu.      'Mi.'    :u)rn  '^tn<'^y, 
conducted   !)y   Cold'ri.m   nnd    Ifcirtin    (107r>),    v^/as  (i  ivon  nn 
inordinntf   ;rnount    of   weiqht   in   Judqo   Pockiiam '  s  decision  in 
the   l,arrv    P.   caso   (see  Chapter  B)    for   the  very  reason  that 
thv^  criteria    to   which    it    nredictcd   was   iudqed  more  clos(-ly 
rclat-d    to   t  ii-u    rt-ouj  r-.^d    for   makinq  T)laccment  decisions 

tli.iu    :U-andar()  1  z.'d    tosts   of    i  n  tf^  1  1  i  q  once  .      This   study  nroducerl 
quite  different   results   than    those   reported    in   Chapter  4 
un<^or    "Fx  term]    Construct   Bias".     Most   evidence   that  exair^ined 
10  t.-sts    for  differential   validity   in   predictinq  acadeaiic 
achievement   across   races   found   no   such  evidence.      In  the 
Goldman  and   ffartiq    (1976)    study,    the  authors  employed  a 
criterion  measure  of  achievement  qrade  point  averaqe  (CPA) 
that    included,    among   other   school   subjects,   grades    in  music, 
health,   art,   and   ohysical   education.     Correlations  betv/een 
the  Wise   Full-Scale  and    10  and   CPA   were   .25(p<.ni)    for  white 
children,    .12    {d<.OS)    for   Mexican-American  children,   and  .14 
(p<.ni)    for   black  children. 

The  correlation   for  whites   is   substantially   lower  than 
those  reported   in  other   studies,  , and   the  substantially  lower 
correlations   for  minority  children  suggests   that   ID  tests  may 
be   invalid   for   use  with   other  than  nonminority  populations  in 
prodir^ir*^  s^^hool   ac  h  i  ev  em  en  t  .  (  as  differentiated  from 
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,„.,nc>v..u.nt)  .     Th.  C'.o  M.nnn  .n.!   llnrtin    (107^)  sfu-iy, 
h„w..v...,    h..s   s.«vcr,>l    ..U-.  i.u;     .u>  Mux  1  o  1  ckh  c;n  1    fl-ws.  First,, 
f,|>A    fcM    hl,K-k;-.   :.tvl   N.  -  x  i  ^m  c  r  i  en  n showed   rons  i  fl  <m- ah  1  o 

restriction  of   ranqe.      Socond  ,   qrouns   wore  combined  across 
.^,.Un<.]r.    ind    ,-:-.nri.'d   to   refle^-t   n   common   standard   used  in 
,,,„l,n<,.      consequently  one  mnsV   be   eoneerned    wi  t.h  the 
iuVero<u.n.Mty  of   the  data  collected   on  the  criterion. 

in   another   study  of   the  relationship  of   the  WISC-R 
la-tor   scores   to  a   lO-itcm   teacher   ratinqof  academic 

H..schly  and    Reschl  y   (1^79)    obtained   results  more 
comparable-   to   external    construct   bvas   studies  employinn 
standardized   achievement   tests   than   those   found    in  the 
roldman   and   Hartin    (1976)    study.      In    the  Peschly  and  Peschly 
study,    the  correlation  between   the  Verbal  Comprehension 
factor   of   the  WISC-P  and   teacher   ratinas  were   .3(^,    .16,  and 
.32   for   whites,   blacks,   and   Mex i ca n- Amer i c an  students, 
resoectively.     The  correlations  between   the  Perceptual 
organization    factor   of    the  WISC-P  and   teacher   ratings  were 
.22,    .26,    .27  for   whites,  blacks,   and  Mexican-Americans, 
respectively.      The  magnitude  of   the  relationships   were  not  as 
hiqh  as   those  relating    TO  to  standardized   tests  of  academic 
..„-hievement  but   they  are  similar   in   that   they  do  not  differ 
across  grouos.     These   relationships  did   not  hold   for  Native 

American   Papago  students. 

using  similar  GPA  criteria   to  that  employed   by  Goldman 
nnd    Hartiq    (  1976)    and   teacher   ratings  of  competence, 
sociability  and  social   conformity.  Mercer    (reported  in 
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M<  •  r  (•< '  t  ,    \^)'V))    li'vo  r  t      co  ris  i  s  t  f  mi  t:  1  y  hit}  \\or   co  f  r  ^»  1      i  on:; 

t)t't  v'l'cn  W  I  S(/    I '.^  scores   cHui    r,PA    for   whit.*.^s    tli.ui    lOr    hl.u  ks  .iiul 

X  iccin- A:ti  ( 'r  i  c-rin  r>   .irul    from    .  PT)    to    .?n   for   hlt-K:k?>.      llovv(»vor  , 
correlations  bi^twoen   the  d'F^A  and   teacher   ratinqs   and  the 
Verbal    Sc^  ^l(>  of    the  W]S(^  v/ero  hic]her    than   thc>  rrit<^rion 
n\v»as>nres.   and    the    P(^r  fo  rmancM^    Scale  of    the  WTSC. 

Th(^   results  of   the   Mercer   study  as   well    ns   those  of 
CoMiTian   and    Ilartiq.are  suc|qestive  at   best.      No  comparisons 
betv;eon   the  validity  coefficients  v.'ere   reoorted    in  either 
study   so    it    is    innpossible   to  deterrriine    if    the  reported 
d  i  [  f »  r  eu'  -es   -u'  e»   st  a  t  i  s  t  i  ca  I  1  y  s  iq  n  i  f  i  c  i  n  t  .      Add  i  t  i  ona  1  1  y  ,  the 
correlations   between   the  WISC  and   GPA   are  of  different 
maqniturle   (the  Mercer   correlations  aooearinq   somevN^bat  hiqher) 
suqqestinq   possible  differences   in   the  criterion  measures 
used.     V/non  co^Tiparinq   the  teacher   rating   stu(]  ies  of  Mercer 
and    Reschly  and    Reschly,    the   findings  aopear    to  be  similar 
witb  no  consistent  differences  appearing  between  grouns. 
Surely,    the  paucity  of   research   in   this  area   leaves  us 
v;a  n  t  i  nq  , 

The  question   has  been   raised    in   the   literature  reqarding 
the   ]eqiti:T)acy  of   using   school    achievement  as  a  criterion 
measure  as  opoosed   to  a  me     ure  of  academic-  ach i evement 
(Reynolds,    J9R2).      Hopefully,    the  concent ua 1 i za t i on  of  the 
various   issues  presented  here,   provides  an  alternative  v;a  y  of 
viewing   this  problem.     K'hen  one   is  using  an   IQ  test  to 
measure   intelligence,   then  certainly  a  measure  of  academic 
ach  ievemen  t"^  i  s  best  employed   since  one  would  predict. that  the 
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nit^'llMM'nt     I   cWil:!    is,    t  h*^  mort*   Ih-.t    cliild    will  acliicvc 
,(l,in  ic.i  M  V  .      v;1mM)   V)"W.'>1    f  rofii    thi:;    f   >  r  >     ^  t  i  v  c  ,    t  hr^    un^'  of 
rr^MSUtv'  of    s>r^hoc)l    .    h  i    v  (»rrH>ii  t   t  Im  t    iru'liKios.  /ic  h  i  o  v  (^'iicn  V  in 
niiisic'   .i:vl   nhy:;ical    ('(lucnlion,    con  t  am  i  nci  t  or,   tho   purity  of  the 
i  t  .  T  ■  \-'ur    .      M^u-'^'M^r,    i  n  t  f  ■  1  1  i  m  r>n'""    t.fvst  s   nro  not  n^-^O'l 

l(»r    (1  1  ci'inos  1      alonc^-      M'hoy  are  also   used    for  ma  k  i  nq   pi  ac  cmon  1 
(LuMsiorv;   and   sometimes   for   helping   to  desiqn  snecific 
inter VMit  1  ons  .     With  resoect   to   the   former,    the  purist  may 
rirnu(^   tnat    the  olae-^ment    i  .s   automatic  once   the  classification 
If;  mad-.      In   practice   that  may  not   always   be   the  case. 
Ind.MMj,   oL)r    personal    ol)  se  r  v  a  t  i  on  s   sur;qest    that   oflr-n   just  the 
oeposite    is   true,   esnecially   in  cases   where   the  diagnosis  is 
unclear.     That    is   to   say,   decision  makers  may  first  decide  if 
the   placement   will    benefit    the  child   and   then  decide, 
according   to   their   olacement  decision,   whether   ox   not  to 
diaqno.se   the  child   FMF^.      Legal   requirements    in   some  states 
mandating   proportional    representation  also    inf 1 uences  the 
d  i  aqnosi  s-pl  acement  decisions.     v;hether  or  not  a  district  has 
mtr^t   tl)eir  nuota   of   one  qroup  of  children   in    FM  R  classes  may 
also    influence  diagnosis. 

**:any  other   examples  can  be  cited   v;}iere   the  nroverbial 
tail    wags   the  dog.      This  dilemma    is   fed   by   the  continuing 
renui  r  cement   that  diagnosis  be  a   orereouisite   to  olacement,  a 
requiremeni:   wi  tii  which   schools  sometimes   find  difficult  to 
adont   especially  given    its  unimportance   in  meetinq   their  . 
major   puroose,   heloing   the  child   learn  better.     The  point  is, 
that   the  decision   to  ol ace  a  child   is  more  complicated  than 
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A;,  si  ■ ■  111  flit    1'  1  ■  1 

iu..,    ,1,  1  MM.-.nn    hrn/lv,  .      V.h..ii   .1  m.i'ot    i  fcrnt..! 

,„.,K..    .   .l...-.:;n.n    .is  '.Ivtlvr    n,    no,        HnM    .       .n    m.m.I  of 

h.,|p,    wna,     i.,t<..:n,,t  ,oi.   •.houl-l    th  'Y   h.iv..  on    h.nvl.'      ShonM  it 
h,.    M.loriMtion  on   n  r  cci  i  c  f.  i  nn    future   n< -r  f  o  pti  ,.nc-(  ■  or,  a 
:;t..n.l  .ri-/..H   ..r-h .  ov  < -m  on  f    tor;t  ,    CV.\  ,    tonchor    rnf.nqr;  or  some 
,.t,„.,    ,-r)t.,  .,i.'      ;.,>-u,o    w    nol    worry    ,.!-u,     .  -.n.-l. 
,,.,,.i,.,i,,nr.    uiM   only   roM>-,.rn   oursr.lves  v   th  ■Tv.kinq  nrooer 
cliaqnesis?      These  questions   can  only  he   answered  by  those 
h,,vinq    to  .nake   the  clecisie:>.   after   n  critical    analysis  of  the 
v.'uole    ;.urpor-,e   of    assessment    :.ct  i  v  i  t  i  es  . 
1  II  t  (  i.-'"  Ml  t^  i  en  _'^^'j^s 

r,,..   -,nor    ournose    for    e-oloyinq    tests    in    selection    is  to 
answer   a   cniestion   reqardinq   an    indivirlual's  future 
porforinanco  as  nrc-dictocl   from   the  test.     The   use  of   tests  for 
makmq    intervention  decisions,   on   the   other   hand,    toc.   es  on 
oredictinq   an   effective   intervention   froni   the   tes  e 
data  qathered   for   selection  decisions  can   tell   us  w  ,   , her  or 
not   a   child   needs  help,    it    is  data  gathered   for  intervention 
decisions   tnat  aid    in   identifyinq  how  to  provide  help.  With 
resoect   to  bias,   a   similar   distinction  can  be  made, 
selection   bias    is   bias   that   occurs   when   employing    tests  that 
result   in   systematic  error    in   the    identification  of  children 
.cross  qrouos  who  need  help  while   intervention  bias  involves 
•    systematic   ,>rror    in   oredictinq   successful  interventions 
across  groups.      :^o ,    for   examnle,   a   placement  intervention 
(e.g.,   soecial   educa-tion  olacemont)    that   is  effective  for  one 
qroop  and   not  for   another  would  bo  considered  intervention 
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t  i  1 , 1  - .  .  i-.l) .  I)  J.  >  v'  n  -v;  it  f  M  XII  within  t  h.  •  ( 'o  t)(  ' » >  t  ai  t  - 1  1  i  /..  i  t  i  on  ( )  I 
v/.ili.lity    in   o  u  t  ( '( •  ;  ,    i    t  <  •  r  v«     t  i  f  »n    hi.i-;    v/onM  th.it  which 

(><'('nr«.    Ill    tc.ts  othi'i    .  i  *; :  ;< ' : , :  ;ni<  »n  t    s  t  i  ,  1 1  • '<  j  i  <  •  • ,    th.it    <n  c  v.ilifl 

It)   pr . 1  f  t  i  [)'|   Micccss  f  ul    intrrv'-nt.on    (or   ono   (iroup  <inrl  Ims". 
valiii    or    invcilirl    in   prod  iotinq    ihir.  rl^v.urod   ontconr"  for 
i :  1  ^  t  !  1  *  1      j  J  '    . ;  ■  , 

As    ;no!i  t  1  (;t). o.nlior    in    t  h  i      olMPt-'r,   M,ita   (mt^)!  oyod  I'l 
cioo  1  5;  i  o  n~:TM  K  1  n']   onn   come   f  ro:n   n   variety  of   5;ourro.<>,      Data  cnn 
l)o  cu-o.-r.ilod    f  ron    tost-rmd    non  to  s  t:  -  ba  sod    r; ■  ssm  en  t 
c;  t  J' a  t  cq  i  or.  .      Tho     >   cir>        snon  t    r)r  oc^od  u  ros    that    qono  r  a  to   bo  t  h 
t,'.';t-    <»n  1    II. )  II  i  . 1  -  [ ist \1    d.it.i    hivo    in    coniinon    th"    f  a  cM"  trial 
th-y   .ir.'   olanoud,      Vv»-    can   (listinnuish    t  ho   data   (jonoratod  from 
tiit^T.o   rdannod   orocodures   from  data    that   aro  omployod  in 
(iocision  makinq  but   not -.  planned  .      Data  dorived   from  clinical 
iiipr  c^:;r>  i  ons  ,    t  ho   nature   of   the   referral    nroblem,  and 
naturally  o-crurrinq   cdi     --toristics  such  as   race,    sex  and 
soc  i  o- (H-o  non  ic   status  samples  of  wh  1 1  wo   are  presently 

identify! nq    ^s   unplanned.      While  data  drawn    from  clinical 
impressions  can  be   ol anned   in   the  sense   '  lat   they  are 
consciously  derived   from  either   test-or   nontest  procedures, 
th'jy  .trr'   unr)lanneci    in   that   they  are   inferreci   from  assessment 
strateqies  desiqned    for   other   i.^urposes.      The  common  feature 
of  all    unplanned   data    is   that    they  are    im pr .  ss i o n i s 1 1 c  . 

T  ntervcntiondiias  .Wi  th  dnanned  .Data.      One  of   the  most 
intrusive   inter  vent  ions   that  commonly  occur    in   schools  is 
placement    in   self-contained   special   education  classes. 
Subscouont  to  such  an   intervention  a  chi?d  is  assessed  by 
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I    , ■  t  ■  :  !•  ■  1    -..'p.   .  -t.l  ..V   v.il  H'lr;    .<  -.it. . -ii  t    • ,  t  i  ,  1  1  1    1  1 '  ■  • ;    t  <  ■ 

,  .,ni.-    1  I    i.l.M--n    Mt  w,.i  1  .nt.-l.       n    v;.MiM    ImI  low, 

,nt..,vnt  KM,,    t  t  r,:,.-','-.'n..in       it..    • t .  1  o  y  ■<  1  d..- 1  r  t  mi  rv> 

pl  ,1,-,  uriit    should  .iMf    to    nifdict    th.it    the   fh  i  1  d  would 

j,  ,,,    ,  ;   ,,  .      I  .  -      id  1  id     t  1)  ,r)    I  r  'vr    -.ir,    id  ic  .       .'Ui.-h  .1 

.,,  ,.,1  ,,.|  1  (.1,   ^r.su'M.  ■•,    thit     th-    i  nt  oi  vent  i  cm   IwMi'dit'.  r.omc 

l,ir.-'i)  .      In    tl),-  (-.»;.(■  of    spfciul    rducitiou    p]  .ir  ■  ■■non  t"  ,  this 
.1  r.r;u;nr't  i  oi)  'm''.   b-cn   qu'.'s  t_  1  ori'^d  .      A   rtview  of    the        pi  r  1  c-i  1 
1  ,  tct  ,it  .re   r.ii   ho'iinq.-nc'our.  qrouoinq    for    1  n  s  I  r  uc  t  1  onn  1  nurnos"S 

i.t    suMMMtt     till-;     ,s'.-;M'nr.t  i  on    (!-'i-nid,i,    1')/',),     nid    t  Iv 
.„.h(,d  ■    .pi  -;d  10:1   of    tho   v.duo   of    sooc-i.il    oduc-ntion    -is  pros-ntly 
conr- iv<'(i   .1-   0    for-n   of    1  n  t .  ■  r  v  en  t  i  o  ri  h,is  boon   nn  onqoinri  topir 
of  discussion    (see    Hobbs,    197S).      v;hen    i  n  te  1  1  1  cp -nee   test  data 
,tr--;  enftloye.],    for   exnmplc-,    to   sunport   placoTient    in   a  clnss 
for    the  mentally   handicapped,    inferences   nre   be  inq   drawn  frorri 
fn.it  dita    for   which   there   is   no  outcome  validity  evidence. 
.Since    intelligence   tests  do  correlate   with  academic 
achiev'cment   there   is   some  outcome  validity  evidence   to  infer 
that   the  child   needs   help,   but   to  take   it  one  steo  further 
and   sa\'   th/it    from   the   use  of    the   test   one  can   nr  edict  the 
child    v/i  II    be   better   off    if   he/she    is   placed,    has   no  support 
in   th-.'  c-jipirical  literature. 

Given   su<-h   a  circumstance   the    issue  of   outcome  bias  with 
respect   to   nlacement   becomes  a  moot   point.      In  order   to  show 
that   t«sts  are  biased  with  respect   to  makino  placement 
decisions,   one  needs  to  show  that  there   is  differential 
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r>i  .ir^-7U'n  t   ar^?  nor   binsp-l    i;o;.'-"i  U(i  s    ^ny  oil'j   'ir^.>ro,  -..'it. 
ineffectual    for   all  qroups. 

In    nddition   to   olacen-.ont   dec  i  s  i  ons  ,   other    forr^s  of 
intorvontion   for  children  exoeriencinq    learninq  and 
adiustm--nL  probloms   in   f^choo]    are   typically   recommended  based 
on  both   test-and   non t os t- ba sed  data.      As  discussed   in  Chapter 
2r    the  assess:!     it  strategies  employed   in   such  circumstances, 
as   well   as   the  subsenuent   intervention   recommended , 
larqely   influenced   by   tne  assessors  beliefs   reqardinq  the 
nature  of   the  probl^^n.     Oua  y    (1973)    identifies  three 
concc'Ptual  models  that    influence  an   assessor's  viev;s   of  the 
ed ucat ionall 1 y  handicanoed   child.      The   first    involves  a 
bi.d  lef    that    th^-^  exceptional    child   suffers   from   a  dysfunction 
in  either    their   coqnitive,   perceptual,   or  motor  processinq 
C£3oabi  1  i  t  i  es  .     This  pr  ocess  ..d  vs  f  unc  t  i  on  -.  v  i  ew  further  holds 
that   the  dysfunctional   processes  are  u nr emed i ab le .      Such  a 
viev/  results   in   intervention  recommendations   that   attempt  to 
bypass  or   compensate   for   the   "damaged"   process  or  processes. 

The  second   viev/ooint,    the  experiential    defect  view, 
involves  a  belief   that   the   problems    in   processing  .denced 
in   the  child  are   the  c    nsenuence  of  defects    in   the  child's 
exoeriences   that   have   left   him/her   v;ith   tlie  present 
dysfunction.      Remedial    recommend.at  ions  drav.ri   from  such  a 
viewpoint  center  around  efforts   to  directly   in tervene*  where 
defects  exist   to   remedy  the  c    :ects  of  de  f  i  c  ien  t.  ex  per  i  ence .. 
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v,,i;nti    uv.   .-InUi    ;)U^    r^itn-r   hv    fhe   lack   o:    nroo.  r   oxtx)Bur-  oi 
instruction.      This   third  viov.-point,    the  oxper  i  enco.,de  f  i  c  jj. 
view,    loads   to   assessment   and   conseauent   interventions  that 
directly  address   the  skill   deficit  evidenced    in   the  child. 

Consistent  with   these  viewpoints  have  been   a  variety  of 
assossTient-intervention.models  proposed    in   the  literature.  • 
Those  who  hold   the   first   two   viewpoints   in  which  the  oroblem 
is  believed   to  be  a   problem   in  processinq,   have  oroposed  a 
variety  of   d  i  aq  no  s  t  i  c- or  e  sc  r  i  Dt  i  v  c  iiodels  to  help  r^ildren. 
The  assossTient   techniques   employed    in  these  models,   such  as. 
the    ITPA,   are  desiqned  to  measure  defective  or  dysfunctional 
processes.      Those  who   hold    the  experience  deficit  viewpoint 
hov(    -aroposed   what   Ysseldyke   and   Salvia    (1974)    refer    to  as 
the   task-analytic  or   skills   train  inn   aooroach .  These 
approaches   usually  employ  assessment  techniques   such  as 
direct  observation   or   criterion-referenced   tests  to  measure 
s  nec  i  f  i  c  d  e  f  i  c  i  t   s  k  i 1 1 s . 

Ysseldy  a  and   Mirkin   .:    ^^2)    identify  a  variety  of 
diaqnostic-prescriptive  models  that   have  been  prpposed  to 
deal   with  a  myriad   of   inferred   processinq  nroblcms.  These 
include  models  designed   to  addres  vision   problems  (Hernetta, 
Ihf,?;   Coleman,    19r,R;    Coleman   ^  Oawson,    106^;      -'bard,  Mounhton 

Thomas,    1972;    Kwa  1  t  ,    1962;    Forrest,    ]9r,R;    Cetman,  ]9r.2, 
196r,a,    19r,Gb,    1972  ;   Cetz,'l  97  3  ;   Gould,    1982;   Cre^nsoan,    1  973  ; 
■alliwell  &    Solan,    1972;    Kane,    19-72;    Kirshnor,    1 9^7 ;   Mul  1  i  ns  ^ 
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,,,,  ,  ,  ,       ■     ;  1  n  :•.-■>  :i      T  " ?  ;    .'^w  T  /  t;  o  n  t"  , 

nvi;    vi,v,   vcCr^rt-hy,   ?-    Kirk,  'Un?^koti,   -is-^man  ^. 

NUM:-,l.c,ff  ,    1<>7?),    orobloms    in   perceptual  motor  dy  r.  [  unc  1 1  on  i  n:, 
(Arena,  Rarsch,    19f-S,    .]nr,7,  Rehrmann  ,/  1  97  f ; 

nun.inn   ^   K.ohar t , . 1 9fi 5 ;    Early.    Sharpe,    1970;   Frostia,  1967, 
197?;   Froatia       Home,    1964;   Forstin-,   Lefever   S,   V,Ti  i  1 1 1  esey , 
19^1;    Johnson  .   Myklebust,    1967;    Keohart,    1960,    1964,  1971, 
Maqdol,    1071;    Marten   &   Haroion,    1962;    Roach  U  Keohart,  1966; 
5Jmith,    196R;    Sutphin,    1964;    Van  Wetsen,    1967),  sensory, 
intcqration  oroblems   (Ayros,    1"72),  -.odality  oroblems 
(doHirsch,    aansky  .   Langford,    1966;    Lerner ,    1971;    Johnson  . 
;-V/klebust,    1  967  ;  Wo  ^  an  ,  '  1 96  7 )  ,   and  nroblems   in   rhythn-,  and 
body  bal.nce    (Rice,  1062). 

r-ronra^s    icontified     by  Ysseldyke   and   Mirkin  (19B2) 
designed   to  reoresent  a   task   analytic  or   sk i 1 1 s- t r a i n i ng 
.nproach   include  directive  teaching    (Stephens,    197^),  dixect 
instruction   (Carmine  U    Silbart,    1^79),    DISTAR   (Becker  S 
Fnqelmann,    197R),  dafc^-based   instruction   (Deno,    1^7?;  Fox, 
F,qner,    Paolucci,    Perlman  u   KcKenzie,    1973),  data-based 
proara-.  modification    meno  ^   Merkin,    1977),   exceptional         _  . 
teachi.q    C.hite,    Harinq,    1976),    individual  instruction 
(pc-t-r,    1972),   precision   teachinn    (Lindsley,    1  964  ,    197  1),  and 
res:K,:>sivo  teacninq    (Hall    ^   CoPeland,    1971).      In   addition  to 
these  general    intorventio:    models,    the  behr.v  ior  therany 
literature   is  robust  wa th  additional    ski  11 s- train ing 
approaches.     v:hile  the  above  models  are  general  tiodel  s 
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.- \      --K,-    nrnhlcm  snecific.  While 
(e  q.,   re  inl  orcimont)  ,   c.rc-  ncooiL.,.  . 

<,,,,„o.t,c.^,.e.c.int>ve  .o<,e,.  focus  on  efforts  to  remediate 
p,.occ.sses,  the  t  .s.  -  „na  1  y  t  ic  or  s.ills  t.ainin.  .o.el.  foc.s 
„„  th,..r   ..Iherence  to     .oouen  t  i  a  1  ,   systematic,   intensive,  . 

i„.Hvidual..ea  ot   ..aU   .roup   instruction  on   s.iUs   that  are 
cHrectly   related   to   the  academic  and  social   reouirements  of 

the   school   pro.ra."    (ysseldy.e  .   HirUn,  o.,98,. 
.        ,::„pirical   literature  on   the  outcome  validity  of  the 

a,s.ss.ent   approaches   employed    in   the  diagnostic  prescriptive 

and   sv..,,s  tra.nin<r-^els  is   revealinp.      Tn   reviewin.  the 

.i^  nre=criptivo  nodels,  vsseldyke 
literature  on  d 1 aq nos t 1 c- or escl 1 Dt i ve 

„c,„,-d..s=r,b,_.s   tnree  co.w.on,  research  .e  t  hodol  op  i  cs  that 
Have  been   employed:    ,1.   descriptive,  patn-score,   and  (3, 

aotitude- treatment   interaction  (^TI1.     The  first, 
descriptive,  attemots  to  establish  a  relation  between  the 
a',ility  or   orocoss  and   academic  achievement.  Such 
rn,pr.ation   lends   towards  the  validity  of   the  construct  and 
selection  validity  of   these  tests.'    With  resoect  to  this 
descriptive  research,   Vsseldy.e  ^nd  mrUn   (lOa.,  conclude 
that  "in  Sbite  of  numerous  textbook  claims  for  the 
•   ■    relat.on.h.o  between  performance  on  measures  of  specific 

.,s  and   on  measures-of   apadem.c  achievement,  extensive 
reviews  of  the  research   indicate  little  embirical  evidence 
for   such  clnims"    (p.   400) .    '  - 
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>ro   rr-,o;,rch   that    attc-Tipts   to   r,how  nains  in 


av,nitv   '::^inu^^   t.roviJfS   U  t  1. 1  o  or    no   ..'.pi.i-O:    su:>;>o.t    !  c  r 
a  variety  of   pro.ra:.s  examined.     Most   heavily  researcher^  are 
psycholinquistic  and  nerceptua  1-r.otor   traininq  r^ronrams. 
unfortunately,    research   in   this  area    is  characterized  by 
serious  .,ethod  1  oq  ica  1    flaws.     Failure   to  consider  the 
Hawthorne  effect,    regression  effects,    linearity  across 
different   levels,   and   lack  of   reliability   in   the  measures 

^         v,^t-h  ^hilitv  and  achievement 
enployed    in   the  assessment  of   both  ability  an. 

make   internretim   this  literature  ex  tremel  y  d  i  f  f  i  cul  t 
(Vsseldyke,    1973)  .        Evidence   from  the  methodologically  sound 
cjain-scoro  st    -  ies  provides   little  suonort   for    the  validity 
of    these  interventions. 

ATI  research  employs  a   sound  methdology  for  examining 
the  effects  of    intervention   program  with  efforts   to  identify 
the  differential   effect  o  f   i  ns  t  r  uc  t  i  ona  1   treatnients  with 
children  who  differ  on^certain  abilities.     The  goal   of  the 
research   in   this  area'is   to   show  that   individual  differences 
(e.g.,    intelligence  )    are    important   to  consider  when 
designing   instructional   nrograms.     ■'^o  ,   for  example,  an 
interaction  between   the  various   levels  of   an  attribute  across 
individuals  and   the   treatments  employed  would    lend  evidence 
for   prescribing  different  treatments   to   those  who  differ  in 
the   attribute.      Research  evidence   in   search  of  ATTs  have  met 
with   little  success.      In  a"  review  of  0.  ATI   .tudios,  Bracht 
(1970)    found  «5  of  them   to  produce  no  predicted  interactions 
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^•inn,    Prnqor,    and   Cr:r-;^    (1^7M  ba^o 


unri(M'    inv'^st  )cj;\t  inn  , 

It   can  bo  cncludecl   fronn   the  abundance  of  evidence 
available   to  date  that   there   is    iittle  emoirical   support  for 
prodictinq   effective   interventions   irom   the  process  tests 
commonly  employed   in   special   education  decision  making.  With 
resoict   to    interventon  bias,   then,  we  again   find   ourselves  .n 
the   position  of   suggesting   that   there   is  no   evidence  of 
intervention  bias  with  d  iagnost  ic-prescr  iot  i^^e  aporoaches  for 
the  simnle   reason   that   there   is  no  suooort   fo^^   their  validity 
v/i  th  any  g  rouo  . 

The   literature  on   skills   traininn   aporoaches  have  been 
more  successful    in  demonstrating    the  effectiveness  of 
interventions.     Conseouently  there   is  a   literature   that  lends 
outcome  validity  evidence   for   the  use  of   the  assessment 
strategies  employed    in   these  aoproaches.     Consistent  with  the 
experiential  deficit  view  of  educational  exceptionality, 
'these  assessment  aporoaches  are  direct   in   that   they  focus  on 
the  measurement  of  behaviors   that  are  directly  related   to  the 
presenting   problem.     Th i s   i 3   in  contrast  to   the  behaviors 
measured    in   the  d  ig anos t i c~pr escr i pt ive  approacb  -s   that  are 

nf.,:rred   to  be   indirectly  related   to   the  oresenting  problem, 
so,   for  example,    if   the  presenting  problem  is  poor  reading 
achievement,   tne  skills  training   aoor.oaches   focus  on 
behaviors  that  are  functionally  related  to  reading  while 
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di..ntv>st  ir-f.rcsr-rini  iv..  ,-,nnr onchor,   focuse.s  on   the  measurement 

The  amst     ;uccessful   of    tho   skHls   trainmn  approaches 
have  heen   those   that  Grnploy  continuous  measurement  of  the 
tarcjetecl   criterion  behavior.      Given   the   tentative  nature  of 
our   unrlorstandinq   of   the  assessment- intervention  orocess  it 
is   rather   or  es  u..ot  uo  us   to  assume   that  a   single  measure  taken 
before   the  establishment  of  an   intervention  can  provide  the 
information  necessary  to   plan  and    implement  an  effective 
intervention   (Deno,   Mirkin  ^   Shinn,   197P,).     The  continuous 
collection  of  data  allows   for   continuous   refinement   in  the 

intervention  nroqram  and  consequently  more  effective  learning 

(Van    Flton   f-   Vth    Flten,    1976)  . 

Programs  omployinq  direct  and   continuous   :^easurement  of 

•  performance  and   the   use  of   these  data   to  make  corrections  in 
subsequent  programs  have  considerable  empirical  sunport 

"     (Ysseldyke       Mirkin,    1982).      The  major   intervention  component 
in   these  orograms   in  continuous  measurement   itself.  Tn 
addition,    these   interventions  commonly  employ  reinforcement 
and   feedback..   Two   such  orograms,   namely,   precision  teaching 
and  data-based    instruction  have  been   oarticularly  successful 
-   in   addressina  math  and   reading  behaviors    (Bohannon,  1975; 

•  Bradfield,    Brown,    Kaplan,    Rickert  &    Stann£,rd  ,    197  3;  Deno, 
Chiang,   T-ndal    (.   Blackburn,   1970;    liaring   ^  Krug,  197-^; 
naring,    Maddux,   .  Krug,    1972  ;    Mirkin,    1978;   Kirkin,  DeiTo, 
Tindol  &  Kuehale,   19R0).     Findings   from  these  research 
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offorvs  ooin^   to   tho   imnortanro  of  utilizinn   the  continuously 


i  ri    \  he-    )  nt  i^r-^<-^r\^  ;  on  .  Thi 


estal.li.h.K.nt:   . , ;    decision   rul..   bas.-H   on   th.   succos.  of  -ailv 
qoal    attain..ontB   (laberty,   1^7S).     The  use  of   students  to 
,rade   and  qraph   their  own   nronress  has  also  been   shown   to  be 
an  effective  >nethod   for   utilizing   continuously  collected  data 

( Fr uness  ,    1  973). 

The  skills  traininq  :nodel  s  v;hile  demonstrat  inq 
intervention  validity  for    the  assessment   procedures  employed, 
have  not^adTressed   the   issue  of   intervention  bias.     There  is 
no. evidence   reported    in   this   literature  bearinq  on  the 
differential    impact  of    the   interventions  across  qroups. 
consequently,    the  potential    intervention  bias  of  the 
assessment  nrocedure  has  yet   to  be  determined. 

nJ^JilthJ^tSl2linilE^-:2li^'      The'decision-makinq  process 
is  a  complex  one  that  draws   information   from  a  variety  of 
sources    in   reachinq  decisions.      Some  of   the  data  used    in  the 
process  have  validity  for  nredictinq   the  outcomes,  of  interest 
while  others  do  not.      Likewise,   some  of   the  data  employed  in 
decisicn-makinq  are  biased   in   that   they  nredict  outcomes 
differentially  across  qroups.      In   the  last  section  of  this 
chanter  we   saw  how  olanned  data   from  only  a   limited   number  of 
procedures-  have  been  demonstrated   to  have  outcome  validity 
with  respect   to   intervention  planninq  and  of  these 
procedures,  more  has-been  empirically  studied   to  determine 
intervention  bias.     'in   this  section  we   torn  attention  to  the 
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no 


a 


tonlial    outcoTio  bins   in   thc^  r^evelopment  of  interventions 


,nd    )s   .]r.wn    ftcoi  direct:   or   vicariouG   oxooricncc   with   a   ch  i  U: 
or  c    .Idren  believed   similar   to   the  child   under  study. 
Clinical    imoressio^,   the   influence  of  naturally  occurrinq 
characteristics,  such  as   race,   sex,   socio-economic   status  and 

ttractiveness,   and   the   impact  of   the  referral   problem  on 
decision  makinq   can   all   be  classified  as  unolanned  data. 
These  data  can  either  directly  or    indirectly  influence 
decision  making   and    its   inclusion   in  decision  making   can  only 
be   justified   on   the  qrounds   that   its   use   increases  the 
validity  of   the  decisic  .s  made.     With  resoect   to  bias,  its 
use  would   have   to  preclude  differentially  effective 
interventions  across  q roups. 

The  literature  on  clinical    impressions   in   this  area  is 
neqliqible.      The   few  studies   that   exist  nrovide  no  support 
for    the  use  of  clinical    impressions   in  either  diaqnosis  on 
treatment   (kazdin,    1978).      Several   studies  reoorting  the 
influence  of  naturally  occurrinq  puoil   characteristics  have 
recently  been  reported   in   the  literature.      It   is  the 
assumption  of   r.    ■    literature  that  employing   factors  such  as 
race,   socio-economic  status  and  pnysi^cal   attractiveness  is 
■  inaooropriate  and   a. biasing   factor   in   the  decision-making 
oroc.ss..      However,   whether  or   not   the   use  of   these  factors 
results   in   intervention  bias   is  an  empirical  nuestion.  There 
is  a  nuestion  of   fairness,   however,   that   is  posed  by  the  use 
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of   th.so   fnctors.      As   discussed   in   Chapter  A,    those  who  hold 

.,hi  t  ..:;r.nMv  of   ...ou..!    )  i  .-d    i  nd  ^  V  i  rl  u  n  1  i  -i-  consid.'T   thoir  ur.e 


phil.^rv     Mv   of    auaiP-i.,x'    i :  ul  i  ^M  d  u  nl  i        w  nd   thv^ir   use  unfair, 
since,    If    t-h.^y  have   any  oredirti-o  utility,    its  onlv  because 
tney  are  correlated   with  o<:  ycho  I  oq  i  ca  1 1  y  meaninqful 
variables.      Tn   other  words,    they  have  no   intrinsic  meaninq. 

Those   studies   that   have  specifically  examined  the 
infliu:nce  of   naturally  occurring  characteristics  have 
atte-nnted   to   identify   the   unconscious    i^ipact  of   these  factors 
on  special   education  decision  making.     Typically,  these 
studios  have  maninulated   r  ac  e  '  (  Fr  ame  ,    1979;   Matuszek  & 
Oaklcind,   1^79;    ToTilinson,   Acker,   Canter,   h  r,indborq,  1977), 
SFS    (FraTie,    1  979^;   Matuszek  ^   Oakland,    1979;  Ysseldyke 
Alqozxine,    1979),   and   physical   attractiveness   (Ross  ^  Salvia, 
]97r,;    Salvia    f,    Podol  ,    1  975;    Ysseldy  o  ^  AlMOZ^ine). 

Research  on   the   influence  of   race  on  decision  makinq  has 
n6t   shown   race   t  .  be  a   significant  variable   in  influencing 
school   psychologists'    diagnoses   (Frame,   1979).     With  respect 
to   Placement  decisions.    Frame   (1979)    found  an  interaction 
betv.-een'  race  and   SFS  but   in  an  unexpected  direction.-    In  this 
study  lower-class  black  children'  were  less  likely   to  be 
recommended   for  placement  than  upper-class  blacks  or  lower' 
and   upper-class  whites.      In   the   Matuszek   and   Oakland    (1979)  . 
study,'  race  did   not   influence  school    ps  vc  hoi  eg  i  s  t  s  "  nlacement 
decision  but   qp.X  did.     Consistent  with  Frame's  s.udy,  lower 
SFS  children  were  less   likely  to  be  recommended   for  placement 
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an  unn.  r   srs  children.      In     the  Matus7.ek  -3nd  Oakland  study, 
w.  r.'   not-    i  n  n  U'-'nr-.-'i   by   thir-ir    nlnromf-nt   decisions  by 

L'^'CO-VT.on-v- t  icns   c:    bl-u^k,    N.st-ivi-    Anu-ncons,    Indians  -lO: 
Orientals,   Tomlinson  et   al.    (l'U7)    similarly  found  that 
special    educaton   placc-ment   was  more   likely   for  whites  than 
the  minorities   in   their   study.      In   addition,    they  found  that 
minorities   wore  more  likely  recommended   for   resource  room 
placement.     Ysseldyke  and  Alqozzine   (1979)    also   found  that 
the   participants   in   their  compu  te  r- s  im  ul  a  ted  decision  maki.nq 
study   (i.e.,   school   psycholoq  ists  ,   special   and  reqular 
education   teachers,   administration  counselors,   nurses  and 
social   workers)    reported   that   SES   influenced   their  diaonostic 
decision  making  more  when   students  were   from  hiah  SFS 
families  rather   than   low  ^PS   families.     However,   SF,S  had  no 
measur.-l    r.npact  on   their   actual  diagnostic  decisions. 
Reynolds  (lf>S2)    suggests   that   the   trend   not  to  nlace  lower 
SRS  black  children   in   RMR  special   education  classes  may  be 
consenuence  of  Dsyc ho  1 og i s t s '    tendency  to  rate   the  "true 
intellegence"" of   these  .ch i Id ren  h ig her   than  their  performance 
i  nd  i  ca  tes  . 

In   a  study  examining   the   influence  of  physical 
attr.Tctiveness  on  diagnoses  made  by  classroom  teachers,  Ross 
and   Salvia    (197^0    found   that   less  attractive  children  were 
more   likely   to  be  diagnosed  mentally  retarded  than  more 
attractive  children.      Similarly,    Salvia  and.  Podol  (197S) 
found   that  speech   therapists  rated   identical   speech  . samples 
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lov.vr  wno 

r  .  ■:■),;  i  1"    ;i   i  'U  t 


n   they  bol  ieverl   the   sa-^Fle  was   from  a  chiUl   with  a 


,,,,,   ,,in)hl,-   cl.;.f      nalato   th.n  when   t-hev  believed  it 


!^  i  ]  V  ,    t  riv    liloi  :r.viie  on   nnturallv   .^cr  ur  r  i  n-i 

chn; .ctcristics    .wdicato    tnat   race,    S-S  and  ohvsical 
<.tt'-activrn...ss  aonear   to  bo   important  variables   for  further 
study   in   special   education  dec  i  s  lo  n-r.ak  i  nq  .      The   fact  that 
th-y  are  used   raises   three  nuestions.     First,   are   they  valid? 
second,   are   they  biased    in   the  sense  that   their   use  results 
in  differentially  effective   interventions?     Third,   are  they 
fair?     The   last  question,   of  course,    is   not   an   empirical  one. 

T!>-   last   factor   receiving   some  attention   is   the  empiri- 

t-u-^t-    -       hp   identified   as  unplanned,    in  the 
cal    literature   that    -^n  be   la^nLii.  i^--. 

influence  of   the  type  of  referral   on   special  education 
decision-making.      Ysseldyke  and   Alpozzine   (1970)    report  that 
the  diannostic  decisions   reqardinp   emotional   disturbanr-  were 
influenced  by  tne  referral    in   their   study.      The   fact   that  a 
child   was  referred  because  of   a  behavior   problem  had  an 
impact  on  diagnostic  decision-making    independent  of  the 
planned  data   that  was  provided   the  decision  maker.  No 
evidence   is   available   regarding   the  validity  of   using  the 
reason   for   referral   as  data   for  decision  making  on  the 
potential   bias  of^their  use. 
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Assc'ssnicfit  i' ) 

r,  opoS.-l     Al  t         11-  t   !  ^  I        TlM'l  1  t   !  'Ml,.  !     A''  '  '••>  :  "1'   !1  I 

As  noted    in   previous  chaptirs,   many  c:  r  i  t  i  (  1  si'is  l;av.'       .-n    b-vclecl  at 
r  :       i  I  .  .>u..  1    {  .'St  in-    .iii'l   as:;-';>.rt-;ii  (  U  .     .      A      '  ii.     1  i,  ,i  t       in:r  of 

traditional    t-stin-   practice:,   hrcan-  i:;on    salient    and  v;  i  de  s  lu"  cad  ,  alt^  

n<-.t  ivcs   U)    these    pro(M'diires   .t.p.t  r.^-d  .      In  somt^   c^-is.  s    the   alternaliv.'  have 
ix  '  istery    in   their  own   r  i  r.h  t   and  r^'ive   been  exaniined    for    th^i;    '  a  J.  1  i  t  y 

nenlMasr  l  neasur.  s  only   r^'cent  y.      Tn  other   cases,   cetnpletoly  n.  w  asst-ss- 
siran-i":;   have   .Tury,ed,    -volvi::-   .at    of   c  onp  1  •  •  t  e  ]  v   n^'c    theoreti(-al  /md 
._,,,,-,.p(,,;5l    ri'dels.      Alfc,    thrr^-   lias    br^-n  a   wliol-     set   ef   att    -ipts    t(.  r.-- 
vis-  or  nw<U^y    the  More   traditional    ^  H/or  er!  n  r  e  s    to  neet  vari^.ns  (h-finitions 
of   noni'iasod    test  in,',. 

Ill   this   ciiapt-^-   s  .r-io   ;i  ss-ss-i-ient    teclmicpies    aed    procedvires  thai 
provid'  u  !  t.  rna  tives    lo   tra'itioiial  iieasnres   are   revievaei.     This  criapter 


pr■e^•ides   a    revif^w  o 


f  cnUnrL'-reduCi-d    Les:iaij;,    renorrun,  ,    aaaptive  beiiavior 


r..-;isni -s,    hiac.  tian  strategies,    learning:  potential    asscssiunt,  dia^;nostic 
clinical    teach  in:.;,    child  dev- 1  opuent  observation,   neuropsych  logical 
::sse?-.ri.-T>t  ,   and   behavioral   asses^^rumL.     Ko  1  a  t  i  v  e  I  y  i:;or  r:  attention  is 
J, -voted    to  hriia.-ioral   assessment   because   tlu-  -   s  t  r t  ■  -  i  ^' l^^^^'^-  rarelv 

M  jn-esonted   as  all-rnatives   to   t  r  ad  i  I  i  i-)na  1   i:ieasui:es  within   the  context 
cf    t-  .e    or  assessment   bias,      Incliided    ir   I  .e   s(^cti_cn  an  b'  nvioral 
is  a  discussion  of    the  conceptually  compatible  areri  of 
t  r  i  t      i  on -re  foi'  enc  cnl    test  \  W)\ . 

V/ithin   th.r'  psychoeducational  assessment   proe    dro-es  ontlinc<l   \\\  Mib- 
.:ur:.t       e't^:l^^  of   this   chapter  sone  unique   feaJ.urcs  ot   child  b^-iv.ivior 
^  i;     should   be  considered.     These   include   tlu'  referral  srinrc(-, 
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?70 


(Kvans       llrlr^ov  ,    P*//,    pi^  .    0  ^ '^()'>)  . 

Thf   iirsL    is.siir    is   that    t!.  '   jca'-iun^,  and   behavior   problems  c> 
,  h'la   .11-  .!ilv    M-ntifM  '  m' <  >s  ^  •  ii  t.  cm  ]   l>y   a    soc  i  a  1  i  >^a  t  i  on  ;\y. 

(,...:-..,    paMiil  ,    toaclior,    -phy-i    'a:i).      Thus,    t  ho   retc-rrnl  inny  b(>ar    I  i  U  1 
or  \\o   i-elat  ionshi  p  to   the  c\\i\(''.':i  perr  -{U- i  on  of    lUr  problem,  althjiy;h 
tliis  may  not   always  be   ihf  ca.se. 

Seeeiiil,    the   vouu;.;   cn'.'d    i  ;    1  y i-^- -i  H  y  under  much   stron)'.er   and  more 
Mhyioiv;  e-'..   '>f   '■ociai    '..oatj-,  i    la.:;;  -aw  adult  or  adoh^sccMit  eliiMit, 

t  iiri  ■  by   i-."]u  i  r  i        tiat    l^-     .  -i    'l'..    r.x.'ial    enviionmiuit    in^.   adecjuately  and 
t  b(  :-ou/,h  1  y        ;ess(Mi.      -A  ei:  '.'n-  ■  :  acM    r.rt;anic   etiology  can  bo  docu- 

nented  d^iinjto  a.v!   :  e      :  f  ic   Inain  d^lma^e),    a  c  (u;ipr  ohons  ivc  V 

enviro!      ■ntal    a    sessment    ^iov'd    b(^   co^.iipl  c  ted  . 

A   Miird   coas [dera t ! on   is   thaL   children  with   serious   learning  and 
behavior   [x-oblems  are   v.  /  v  i"  t"     o  on  1 1  y  already   involved    in  :    .ne  attempt 
to  cjltei    tiioi'-   i:';havior.      liucn  alterations  may  occur   prior   to,   during,  or 
after   scjme   therapeutic    in  to  rven  t  ion .     For  example,    the  psychologist 
conducting:   a  behavioral   assc^ssinent   of   a  child  experiencing  reading 
di  f  f  icul  t  i's     may   1  i  lul   that    the  child   is   receiving  special  attention 
frc)!!   ill     ilas5a,>oi;i   'eacher,    f.utorin,"   after   school,   or   possibilc  involve- 
ment   in  one  of  .->any  packag.ed  rc^mecK'al   prog,rains   in   the  '  ho'/ii- .      Such  issues 
should    prompt  assessment  of    thesr:  prograi-s  and    their   eon  t  r  i  bu  t  ion  .s 
(positive  or  negative)    to  other    i  n  t:  erven  t  i  ons  . 

The   fourth  characteristic   of   child  assessment   is   that   it.   is  frequent! 
linked  with  cognitive  or   physical   assessment.     Th i s ' pos i t i on  assumes  that 
some   learning,  problei:is  arc;   inexorably   linked   to  physical  or  cognitive 


27 


(;,„..,l->MMl.  Thus,    in   a,Mi,w,n    to   „:,:..     .  i  .hil.l'-      ■  u 

i,,,,,  p,-.,bl...s,    as...ssnu.ut    should  us. fully   locus  ou  vision,   lu-uiu,,,  sproch 


,..v..l..,.m.-nla!    variaM-s   aro  i  nfricn  foly   invo.vcc?   in  a.-.sess- 


,„„1  >,,i,,.r  rclev.u.t  Vhysicial  (.robuMu; 

uMriila!  varial'l' 

,„^,   ,,,.,.1   Whav.or.     Kvans  and  NM.son   (  1  97  7  )    nul.catc   M,at   "th..  youn.o, 

child    th.  .K.ro  nn.st:  deviation   fro.  d  ove  1  opu..  t  a  1  p,o.,n         th.  .ncasu... 
of  ahnon:.alUy"   (p.   605).     Thus,    tie  conventional    notion  of  "spontaneous" 
,,.,,.v..ry  or    i,npr  ovcnuM,  t  ,nust  bo   taken   into  account.     For  example, 

M.s.rvod    that    so,„o  problo.ns    (r.;;.,    poor  concent. a- 
,,,ore   pronounced   decr-.-se  wrth  a;,.'   than  othe:s    ( e  . .  ,    po^  : 
sc..,oolwork).  ul^-en  experiencing   fears  and  related   problems  u,ay  .--t 

,:,:,ensive   intervention  for   su-    essful   elimination  ot    the  problem 
(Morris       Iratochwrll,    1983),  ,rof  ess  iona  r  must    then  consider  that 

the  soc.al    significance  of   certain  behavor  and/or   lear-nn,  problems 
v;i  n    .-I  1  lev   with  • 

The  notion  of  culture  as   it   relates  to  bias   in   testin,  has  received 
.,,cl.  attention  due   to   ^ts   inherent   implications   in  psychological  assess- 

Since  psycholo,ical   and  educational  measures  are  most  oft.n  :ples 

,,,nc   h.havmu-s    (i.e.,   b.avior.s    ..ampled   at  one   p.i.t   in   tin.e)  certain 
,,.,,,arp,  environmental  or  cultural   xmpact  on   the  a  e.uisi- 

.,r    u,ose  h.  haviors  are   inherent   in   the  desi^.of  any   test.     The  Icon- 
,,,,  m.  a.ure   of  behavior  must  be  understood  as  an  interaction 

between  one's  genetic   predisposition  and   the  culture  of   the  individual 
..Uose  behavior-  one   is  attempting  to  measure  has  .been  recosni.ed  throughout 
the  history  of   testing.     In  an  attempt   to  minimize   the   impact  of  cultural 
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i,.  t         .■>:,!>  ipl  .  ,    I'.in-  t  ,    in    1  !h'    < '  ■  ■  v  .  >  1  o  i      •  u  t    nl     !         1  M  1    v-i  M.>n     .1  1.1 

intfl  1  r    t.-st,      .  v\\\         yj'^    ilnii;,    to   r  1  i  iii  i  ii.'H  . '    thosr    Ur    tell,   wr  r  r 

ovrrly  d  c  p.Mi- 1  r  i .  (    on   i  n  i  o  ri:ia  I:  i  cui   sprrilic    to    i        rulturc  of    thv   school  or 
to   th/it    (MiltMT.    t  I /infinit  trd  iiicw-o  r(>.'i(Iily    in  odiuatod  hom^-s   (Jcmksoii,  19B0). 

tc    Hp     tlriMsl    univors.-.l    /uc  t'p  t /ml  it    anv  sampic  u1    t-st    t)v[Kivioi  is 

a  piTMliu  I  ot  an  inLcraction  lu'twrrn  ont's  inlu-rrnL  polcnlial  and  Lho  culLurr 
Iron  which  h.'/sh.'  co!:;cr>,  nnch  (h'hatc  is  s  t:  i  1  1  und(<rway  Yry,'^rd\uy,  the  dcy,rvv 
of    the    iiipact    oi    Ihc   L'liv  ironmcnt   on   behavior,    Uh^  r!  if  Terences   of   the  impact 

t,,   v.niati-ii    ill   coltui--,    and    l.he   .■>:lent    [o   wh  i  ^       the   coutont:    of  LesLs 
,  ,  •  i,..  I  ..  .  M  1  1  ui  a  I    dill  oiaau  rs    that     lay   i  f  sal  t    in  hias  . 

A  vai-iety  ol    tt  ii:i5.  have   h.  •  a  oMered    in       cent   ye/irs    to  (Wscuss  th<- 
last  c^r   these   topics.     One  of   the   first   terms  used   in   the   literature  to 
,,.au^te  a    test    thaL   v;. es    free  of   cultural    influence  was   "culture  fr^ 
i;!,il<-  recoivin):  a   flurrv  of   attcaition   in   the   Titties   this  concept  has 
y.inci"  heen  d  i  sc  r  c>d  i  t  ful   on   the  ^;roun^ls    that   any   test  of  mental  ability 
nust  dcpca^d  en   s.^!:u'  experiences  acquired    in   sonu-  cuTure.      Simply  stated, 
a   test  can't  be  "Iref"  of  culture,.     In  addition,   attempts   to  design 
"culture   free"   tests  did   not  meet  with   those  i    suits   expected   by  their 
.\c^:[yu-  rs;    thai    is,   no  cultural  r>^^"oup  mean  dif   arences   in   per  forinanc  e  . 

Th.    concept    of  "cultur-    1  -     ■"   soon      -ve  wa ;    to   the  concepts  of 
"eultur-   fair"  .nui   "culture  reduci       .      m   es  e  r  one  ep  tua  1  i  za  t  Ions  v;-  ■  e 
dr.sir.ncui   to  connote   testr  or   tost   i  :    ms    tl.it   "reduc    i"  dependency  on  a 
parti:a]ar  cell  ire  lor   co-rect   responding  and,    tliert-fore,   are   "fair"  to 
all    individuals  rc^gardless  o-    cultural  background. 

The  Cattell  Culture  Fair  Test  of  ^  and   the  Haven  *s  Progressive  ^ 
jj^^i_ct^s  are   two   suca  tests   that  have  attempted  to  reduce   the  degree  to 
\.;hich   the   cul;arc  in  which  one   lives   influences  performance.     Other  tests 
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I.  (1)..   L,  -  i  I  i-r     1  11 1      M  I  t  i  Mii.i  1    .  ■  ■  t  .  ,  i  -  1  1  ,    !  !)  -  ■   i'  M'lK  li   r  i  c  t      >i  I 

T»'s(  I  lit  (•  1  :  i  ;•  fiir  r  .iiwl    llw      >  li  v  t  >  r  h.'i  I    l(-;t    >)l    i  :(^;'n  !  1  :  v<'_  :.k  i  1  1  s 

iiiti)   il.,)s  r  1  as.s  i  {  i  (  .1 1  t         'One  •  t  basic   aims  of    lhc'S(»   tyiw-   lc:.ts  , 

i      t()  i  v'd  K      vri  bal    conUMii  ,     To  varying,  c^'^'yjvi'^   the  aiiiouiU  of  verbal 
i  ns  (  »  'i(  f  i '  >ii.s    toi    (be    ilriit.';  .iiid    thr  aiiounl    ot    vrrf)al    responding;  reqnirtMi 
to  correctly  aiir.wtM-    itei.is  are  in  i  n  i  in  i  x.ecL      In  addition,    the  content  o  1' 

it.  -  s    is  usually  desi^.;neu    to  reflect  novel   sitnations  requiring;  tbe 
;M)pl  i  (  a  t  ion  ol    coniplex  c     ;ni[ive   "kills  as  opp(>         to   sitL^'iFiniis  heavily 
d.pmd.iu    vMi   r-iiltiual    r  >:  {XT  i  rn(  I  • .      Such    tests  have  h('e[i   touted   by  sor:e 
(p    !.,■    ].■•,■.   hi, !.'■(!   a'-iiust    ciltural   i;i  i  nor  i  t  i  o and   a  more    acC(>j)tahIe  i;u'asur(! 
,  '    in(  .1  !  i.  riK      than    tr^idi    luual    IQ   tests   such   as   th(.'  WiSC-R  and  iford- 
liiiiet.  .     Hcsoarch  evic*.ence,   hov/ever,    indicates   that  r.iean  clifferencos  across 
cultural   ;,i{>ui^..    is  approx  rina  t  f  ly   the   same    in  culture-fair   te^'t  s  as  they 

in  conv-'-nt  ional   IQ  tc^sts   (Arvev,    1972).     This  provides  additional 
rvidenee   for    those  v/ho  ar^',uf^  that    tht^  differences  between  ^;roups  on  mental 
ability    .    te.        are  i    ri  1   differences   in   intellectual  ability  (Jensen,  198^). 

Tests  or   l^.,':    itej;us   that  are  not  culturally  "reduced"  are  often 
teM-od    'cuitnraUy  boand"  or   "culturally  loaded".     The  de'groe  of  cultural 
iufhu  ncr  on       t<  ;  t  or   test   items  can   be  viev;^  >   on  a  "hypothetical  ccwi- 
tiu-.ua./'   ra.p.'.in-,   fr'>;:i   "('.Iter.'   f  c:e"    (.^   "culture  bound"    (Jrinsen,  1980), 
Wii-'c   i!  I  .s  f:ont  1  uu-);.i  {n'ovides   for  a   sirnple  conjeptual    und     s  tand  i  ng  , 
d  i  ;^  C  i  .)ns  aloii:;   tho  continuum  \\i)\rr.   in  rc^cent   years  becoiinj   less  impor- 

l.irl    than   tb.:'  now  \.\ovi-  po[)u]ar   di.stJuction  bctvy.  ^'n   "cultural   bia:/'  and 
"cultural    loauin;/.".     Those  who  use   the   teinr    "cultural    bias"  and  "cultural 
load  i  n,;"  when   spee  of   the   influence  of  cultu?^  on   tests  and/or  test 

itei,:s  tend   to  devalue   th/  importance  of   the  hypothetical  continuum, 
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.Ml}*)' ••s  {  i  ■  'M(l    th.it    it    is  .1   j'iviMi    {hat       M    i.  tcnir,  .ire  ciiltui'lly 

lo.ult'd.      Tin*  <N';',rc('   lo  wliicli  .in   i  t  (mm  (m    ti  ,  t    i  5;  culliir.illy   Icji'lrd  c/iii 
not   1)1'  an   in(li«.<it()r  oi    Ijias,   jiinc('  we  I). we  no  criterin  uv>on  wlii  ch  to 
judj'.c   i,tj;  place  on   t:lu»  continuum   (J(Mis(^n,    19H0).     TIk*    y       uMit  of  wliJ'thc»r 
u  i         I    a  1 J    i  I  I  i.  1    i  : .    '  \  i '  i  I  u  1  . 1 1  1  y    b  i . i ; ,  *  •  d  "   01    : ;  i  1 1  i  p  1,      "  c  u  I  L  u  1  ^ 4  i  i.  y    1  (       c <  1  "    »  ; , 
aa  '  "inpi  I"  i  c  a  ]    cpn'sLion  and   should   be   LreaLed   as   such.     ('lari/,  io  (19/9), 
tor   <'xai;iple,    ar)',iies   LhaL    thoi'c  ha'   been  much  conTiision  in   the  int^     1  i^^ence 
te.stin;;   lit'Tature  due  to  a   failure'   to  d  i  s  t  in^;u  i  sh  between  cultural  bias 
and   cultural    I  oad  1  \\)\ . 

Renoi  1:)  i  uy^ 

An  alteraativt'   to  d      i}',i)  '  nj;   tests  with  cu  ]  t  urt^-redLic^-d  content 
in  an  effort   to  equate  per  forinance  s  across  ;i;roups   is  to  use  conventional 
tests  sucli  ai:     ae  WJ  SCi-R  and   interpret   the   results         as   to  equate 
per  lornanc  e  across  ^',rv>ups  after   the   fact.     The  content:  of   the   test  re:uains 

saru^  for  all,   and   the  administration  doesn't  ch-in^^^e.    ^U^iat  cluinges^is 
wh.      one  dcx^s  with   the  data  af*^  r   it's  collected.      By   far       he  ir-st 
p:)pular  atleiipL   to  employ   this  approach   is   the   System  of  Multicultural 
PleraJistic  Assessneht_( SOMPA;   Mercer,  1979). 

The  ^^i)M'\'\   is  a    ..'yi;prehc:n$  i  ve  system  for  asses    iei;  intellectual 
fun<  <ic   in     tlia  t    is  b,i-(ul  uui  a   cultural  and   ;-:truccural   pluralistic  TK.'del 
of   society.      Basically,  Mc^rcer    (  1979)   ar^^ues   tliat   tr:'ditional    testi.^,  , 
)>:.ictices  are  ba.s-  1  vi\   (he  An^lo  conformity  'iiodel  of  society.      l^^sts  based 
on   this  model   assui;ie.^     'lh.il   all    child  rem   in  American  society  cjre  either 

■.n,^',  reared   in  faT:iilics   tl    ^  hc'we  been  b6th  culturally  a  d  structurally 
intc^mated   into   the  Anglo  core  culture,   or  arc  members  of  faipilies  of 
ethn\c  ^;roups   that  are  cul  tural  ly  integrated  even  thou^  ;    they  may  still 
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h,.        MM  tin  ,il  .  •>    s.'p.ii  .itr    ...M    i. m  ...hlr"    (MriciM,  p.    1^  )  . 

Inli-vr;,    1       I    Mu  li    .1    \  1  -  '   i^'.sult;.    in   ,  i : .  :,Miii  | )  (  i  on  • .   af^nit  trst 
prHoiiiuiiu  r   th.it    .irr  imlaii  .      M     i1l    c\\[\r   m   arc  not    iiUrj^ratV^i  int 
(he  Am.'.Io  corr  cMiltu;-c,    tluMi   K^sL;;  l)ascd  on  a  ssuiiipt  i  (Ml  s  of  common  cultural 
,.yp,.,  i.M^       ,M-r   )',<nn-,    t  (>  hr   hiar.cd   a^>'iii,'t    tliosr   v^ho  arr  not.  Accoi^Linj'. 
th;.  virw  Mi,. I    ••/f    1  ivo   in  a   ;>  I  ii  r  a  1  i  s  t;  ic    society,   rc(iuir»->,  a 
M,.i;-rr,    int  crpi      i  vr>    scIumiu^s   that  arr  rai)ablc  of  comparing;  pcrfornia.n 
only  ai).  nr.  children  Cror.i  similar  hack^^rounds  ,     The  SOMPA,    in  part,  i; 
pur  poi' t  (h1    to  do    i  M  .  i  I    tha  t  • 

'II,     soil  \   'iiploys   th(*  n«e  of    ton  nioasu      ;  v/ithin    tluM^o  assessment 
,,,,,-1,  I;,:    t!,..  .  li   Mud.  ],    the   r.ocial    syst.;,is  i,iMd,.l,   and    the  pluralis:ic 

i;u      1.    'i'lio  iii^'dica'   .iu)(tf.l    ;s   eiiiplovfd    to  ch « term  i  uu*   if   the  cliild   it)  hi^)- 
lo;',icall-    nen>.  1.      'to.    ^       neasiiros   include  Physical  Dexterity  T.^sks, 
;;ri".ht    hy  H<>i;'ht      .  '  AcidU_,   Auditory  Ac  u  i  t  y    _H  <;^j^j_tj2_  J[j^j_gJ.  or  y^ 

Inve'utorios  an.     '        ..^ncier   Visiia'   Motor  Gest  alt  Tost.     Two  measures  ar. 
..j,loy-«'         assess   the  child   from  a   social   systems  inodol  pGrspoctive: 

..1  ed    the  ABIC.     The   purpose   for   th         use   is   to  gaa^;e  hoJ  well 

;.u    -hi  Id    i       c?rforining   relative  'to  his/her  various  social   rolcr-.  i:iasGd 
on  .1  model   of  social  deviance,    the  measures  attempt  to   identify  if  the 
(-hill  ihnormal    in   tlu«   son^e    that  iio/shc  d-viatcs  from   the  expectations 

of      fr    V'.     n    thi-  -     up.      Wkhin   this  context   WiSC-R  scores   are  perceived 
as     -Ui    .•■;ient     ;c(^r^'.    and    labeled    School    Functioning'   Level    (SFL).  The 
'v,;r[u)^u>  ol    the   plur-^'i.-nv   modol    is    to  assess   the  child's  K..:niiig 
potenti  al.      It   employs   tiie  V/ISC-R   for    'his  purpose  but  adjusts  scori-s 
according,   to   the  socia)    and  cultural   ^\youp  from  which   the  child  comes. 
In  order   to  maKc   the  \ad  ju  s  tmen  ts  ,    the  Soc  iocu  1  tura  1   Scales  are  employed. 
These  scales  a^dc  question,    in  four  areas:     Family  Sii^e,   Family  f;tructure, 
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SHi<u.c(M,omir   St,.ius,   an,l  Urban  Ar  r  u  1  t  u  r  a  1,  i  ci .      Tluro  cqua  t.  i  cms  ,  on.- 
,,,,,■1,   !,,.    V.rl.al,    i'ril  ofiiian.  .•  ,    aiul   Full    Sc  a  K'    l(>r.,    arr   en.  p  1  .-y ' '-I  I 
^..,t  i,„,a.'   'I,.'   l.aruiny,  pn  t  c  n  I  i  a  1  ,s  srparaLH'ly  lor   l',lack,   Hispanic,  and 
Anf.io  childr^Mi.     A  child   is   Ci.    L  classified  as  Aur.lo,    Black,   or  llispauir. 
r!.'.--:'  .    In'Vh'M-        ^'cr;  on   the  Sociocullural    Scales  ai.-  .nmputcd  and 
w.Mc.l.tcd    ill  accor.lap.r  witli   llu'  ;.',roiip  lic'sli-  hrloni'.s.      Finally  an 
F,stimati-<i   Learning.  Potential    (FLP)    is  derived.      Similar   to   the  WISC-R  IQ. 
tho  Kstiiiiatfd  Lcai  /  n;;  Pot.  i.tial  scores  have  a  mean  of   100  and  a  standard 

d ( 'v  i  a  t  i  on  of    1  . 

Tlu'   staiulardi/.aticMi  (^f    the   SOMPA    is  based  on  a   sample  of  208S 
(:.,liio>nia  .iiildren  re  p  res  en  L  in;;  eq.al   numbe.s      •    lUacl  ,   11 1  sianic  s  and 
An;-'.l  .  .hildren.      Tt    is  appropriate   tor  use  w.^       children  ,.:'cd   b   -o.  U, 
i,u  le.sive.     The  var?  )us  mrasurr-s    ire   taken  in   intervicv  with  the  child's 
pa.--,  nt  or  r.'  ardian  and   throuj'.h  dire.  '    testing  of   ihe  child. 

Wlii'      !    e   SOMPA  has  been  cfiticizid   for  a  variety  of  reasons,  the 
,,v.sL   noin-     1  (riiicism..    involve    Its  conception  of   the  ELP.     While  the 
pluralis..c  model  on  wlrch  the  measure   is  based   is  by  no  means  contro- 
versial,   the  nse  of   this  model   to  derive  estimates  of   learning  potential 
is.     M'.sL  v.'.uild  agree   :    a"    all     hings  being  equal,    the  learning  potentia; 
of  children  from  vario,      cill.T  >.    groups  '..■ou !  1  be  equal.     The  major 
criticis;:,  a-  ises   from  how  and   ...■b,-   th.    Y.W    ,s  derive  >.     Goodman   (  1979a), 
for  example,   points  out    thai.  gaLlieii'V,        !  e ...    fac-.;s  ..b..-    the  SF.S, 
acc.ul  turac  ion  and   the  fami    \   of   the  child,    is   inacicqu./ue   ■  . 
to  gain  an  accurate  picture  of  a  chLld.'s   le   .ning,  potentir.'.  Cl,-,ri/,io 
(  1979)   argue,^   that   the  potential   imp     '    -^f  using  such  a  systein  v.-i  U  do 
more  harm  than  good.     By  declassifying  children  who  are  nov  eligi'le  for 
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■  j.ii.il   ••(jui  'f!    :i  p  1      riiHMi  t  ,   many  nrc      \   srivici's  vvi  1  1    hv    lost.  Coodiiian 
jiMiiUs   pu(     tli.it     thr    I'  iirit-:;   v;  i  t  1  k    it     M       ,'i    sump  t  i  on  tliif 

liiMifii  wh'ist'   .^unii's   atr  not    rai:.    :    ly  .id  in:;  tmtii  t    t  (i  UMSC  scorc.^; 

,irc   I  .ol     '.irally  i  mp.i  i  i  rd  ,      Slu*   a    'tn^.s   thai    this  conclusion   is  nol  only 
fi . . bn  !    a    tlirov     uk    to   IK         /<;  when  piiiM'  m  .  i  iv'   of  biolojj.cal 

pot  i'li  (  i  a  1    v/i'  re    1m  I  i  rv  I'd    po^i  .s  i  1)  1  c  , 

A:;   Ihrsr  and  otlur  aulhorr,  havr   point('d  out,    (lit"   iruc.vahn.'  of  the 
Idd'  conslriK      r-r   L:;    *      Lt_:^  validity.      In   this  ro^'.ard  ,    the  LLP   has  little 
to   support    it      us..     1'h.'   lelationship  of   t        KI.d'  to     ^adeinic  achievoiiient 
i  <.    ]ov;.«r    til  to  acadonuc   aclw  rv  (m-umi  t    (Oa    'a:i    ,       >7^0,      Conseqii'  mIv. 

n.o  u:.f  Ml    f        Idd'    III   p]<no  oi    the    I      rechici\s  o,io:,        ility   1''  j)i^o(l;« 
acMd'.iiit    .iciii.     ■i.ii'nl,      la  dt-fonse  of   tiie  LLP,   Moiccm'    has  i\vy\\VK\,  as 
have   in   this   I'^port,    tliat  i>re(iicting   to   standardized   acliLOVei     it  tests 
nay  h.'ive    its  i)i()blo,,;s   sinco  both       y  be  measuring   the  sam*     thin)',,  achieve- 
=  uMn      not    potrntial.     Cons (HiU en 1 1 y ,    the   fav*    that   LLP   sl    <  os  do  uoc  predict 
ac..-"   lie   ac:h  i  evt'iiien  t  (L)os  n'>'    automatically  iii  validate  it.  However, 
ar^'uin,;   tb.at   IQ   is  not  a  measure  of   potential   and   periorm.iuce  on  acaciemic 
aclii  evnion  t    tests    is  not  a  measuie  of   learning   says  nothing  about  ' 
validity    A    the  LLP.      11    the  1:LP  has   predictive  validity   then   it  must  be 
demonstrate!    ^i.it    it   is   related    to  sonn    ru>asure  of   learning,.      M.  not 
••d  i  zed  aciiiovcmtMU    tests,    thee.  :  m;u'  other  measure  of  learning, 
ly,    li    It    is  usrd   for    th('   purpose   in  vdn\.ti   it    v;a  s   intended  (i.e., 
(MMVil    pU'ieoment)      tht'n    the  j)  rob  i  b  i  1  i  I  y  of   th^  :>e  placed  sfce-ving 
r  f  !  i".:  t  i  n  tcrx  "Dt  ion  v/ould  hav      to  be  demonstrated,    tiu'   predictions  < 

v;otild   hav.^   to  be  better   th'an'  those  d(»rivod   from  use  of  TQ  tests,   and  it 
v/oold  have   to  be  sliov^n  that   its  use  wou'ld  result   in  equa  ]  1  y  e  f  f  ec  t  ive- 
tJ'ealrient   across  g:"nups. 
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TlM-  n.MV     .n-'H.    ..1    a.1,,1     ivr   In  •  1,, ,  v  i  > . ,    ..s  a  nuiqur    >.u:,|n,n,M.t     in  tlw 
,,,„.,,,,„.nt    oi    M.nt.l    ..tar.lation    is  a   r.laU    .My  rcuM.nL   oouM^'nc.  As 
R..,;rl.lv   (1982  )   point,    out,   oarly  .1..  f  i  n  L  t  i  on  s  of  adaptive  h<.hnvi»r   as  .'X- 
,    a,    ,„    .i..'   A,:.M  l.an  A:,:,n,    , : !  ion         M'.-'al    i  - i.-imrv  Manual    Mirbor  . 
-  ,  h.aw   ,.,,pl,as,s   UP    hsunnu'    ah.lity   as    thr   rnt.ria    I  o, 

..vaiuaMu,,  a.laptiv,.   bol.wio'-    in   school-,  chUd-.n.      Consistent  wth 

this   c.n.optuali.aMon,   adaptive  behavior  had    traditionally  been  ,neasured 
,!„<n,..,h   tlu.  use  „i    standardized   achieve,:,    ,t    tests    L       .nis   ag.'  group. 

r„„..,.,  nlly,    a   eh,M   wi   ,    p^rlonnsl    p.orK   a  neasure   e .  ^  i  n  ,  ■  •  1  1  ec  t  u  a  1 

,„.,e,i,.Mn,  n,    chc   bora-riin,.  or  no.,. ally  d    !  ,  e  i  .a,,      ,an  a.. 

w..cl,s!    .  -s   .1      si.ieaMou   :yste,;,    (V/cehsler,    197-'0]   and,    in  addili.a,  r 
poorly   in  school     .s  def.ned   by  a  neasure  ot   acaden^c  achi  v- 


,,,U1  ic.l   as  lul.Dv  or   educably  nentally  handicapped.  Recent 


ricnt  ,  <pia 

P...,,ni..ons,   bowev,r,    provni.       view  of   adaptive  behavior    tha^    is  nulti- 
,  rue  ueasurer,ent    of   academic   achievenent  to 

...,..s    its  va-.-.s  conponents.      The   19/7   e.-..on  of   the  AAMU 
manual    (Cros.sr.an,    1Q77),    for  example,   suggests   that  essential  coping 
,Kills   such   a.   :     .       involving   the  co^^cepts  of   time  and  money,  self-directed 
V,,„aviors,    :       ia!    responsiveness,    and    Interactive   skills  be   included  in 

,:ive  b.h.a^-ior.      This  chang.   r<  ires   that  e  attention 

i„.  n,,sn   ;.revjously   to   the  tneasurenent  of   adaptive  behavior.  Given 

;,,,d.:c.    ^    and    :    gislative  rsar-    .  es    tor      t  s  use    ,see   Chapter  8),  those 

,    •      ,,,      ,.c,..sne  i'  retardation  have  been  searching,  for 

invoi.-ed    in    the   a  r.  .s  e    .sne  a  . 
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I)(  ■ }>  i  (  .  ■    I  1 .  1      n  t  M  I  ,    (  In  '    I  .i;  l^,    ()\    :i    (•  1  < ',  1 1-    s  [X  •(•  i  I  i  r  a  t  i  ■  Ml   ( W    wli.M     cdii  -  ;  t  i  t  u  t  * 

I  ! .  I  [  •  I  i  v'<  ■    I ) '  •  h .  I    i\  )  1    1 M  '    t  .  ■ .  1 1  1  t  I '  1 1    M )   f  I  p  ■    I  Ml  h  1  i  ( '  •  1 1  i  M 1 1   I  >  I    ;  I  V  ,'11  i  I '       o  j    ( I  i  1  t  i '  i  rn  f 
t<'St.';  jj  rof  ("hi  r  t tli,'it    :i  1  I    purport    to  incnsiirc   ;ifUiptivt'  brhavLof. 
KcMiily   (PJH?)    idontifir;.    iDiir   I'r.'i  t  ur(^^;   /Jiiion^-,   tin*  vnri'ons  d  r' f  i  n  i  t.  i  oils 

I I  i    .  f '  1  ■  1 1 1  i  ^  ■      (  M  '  1 1 , ! '  •  ■  .  M  •  f  .  ■   1 1  ( >  t  <  ■      1  111'/    t  ( » i "   (]  i  ",  f  ■'(''.':  i M  ;      (  i  )   ( 1 .  ■  v  <  •  1  (  j  pi  ■  i  •  ■  i  >    i  1  , 
(?)  tiiltiir.il    ronti'Xt,             ;  i  t  iKi  t  io  u.  1 1   or  ^'.ciU'r  :i  1  i ,   and    (A)   dc>iiiainr>,     11  n* 

AM' 1  o  j)i:if  n  I  ,1 1  If.'iUirr  rc^  t  ^;  to  thi^  ,'.t,-ited  or  ii;ijjlic*d  understanding;  in  all 
di  •  1  i  n  i  t  i  oiis  tlial  tlu^  critoria  lor  assessing-,  adaptive  hchavitu  changes  with 
,'i;m'.     Tl'is    is  .  :•:  ( Tip  1  i  t  i  ('d    in   tho  AAMI)  criteria     (- ro  s  .simian ,    ^^^7  7)  that 

-  .if  il  i         M  '  n  .so  r  V  i;u)  to  r        i  1  1  ^;  dr  v  t  •  1  o  j  iiih  •  n  (    as.   r  i  i  t  cr  i  a  diu  i  u;;  inl,'inc\ 
an  It   tsiiiv'  c  hi  )>.,  I  ood   .iiid   not    durinj;  cdiilrlhood,    ado  1  isSsi  ik  c  or   adult  liio. 
I."     ]\   rr;.;ard    to   tin-  second    feature,    cultural   context,    Reschly   (1982)  reports 
th       i;io5;t  (h'fi  nit  ions  of  a(h'i[)(  ive   bcihavior  acknowledge   the   importance  of 
(ultural    iiihunce.s   cmi    the  d  (?ve  1  opinen  t  of  adaptive   bch'ivicu     lud  ifc:o);nize 
the  ne    I    to    int(u^prt.-L  adapt:ive  bcdiavitn'  within   the*  context  of   the   in(n^^idual  ' 
cm)  tor-' 1    haclcrounc' ,     The    third   feature*  of  adaptive   behavior  definitions 
retrf  ';   to  t)i'>  dynamic  nature  of    the  construct.      As  mentioned   above,  current 
d(^  f  i  n  i  t  i  (  "Ts  of  adaptive   bvdiavior  arci  mu  1  t;  i  f  a  r  e  ted  ,   rep.ardless   to  which 
a^'.o        Mip  is   i-eferiir.g.     Thc^  question   then  arisv/.-   i  t:;;.»  ■  .i  .  .i^,    the  reiation- 

sh  i      ain«^    ;    the  various    f.se*  ts     v  (h    a  ins   and    the    influence^  of  diff('i*>iit 
SM'  '.si    systisis  and   rtW    s  on     ach .     Most  definitions,    according;   to  Resclily, 
inpl      thai    t  h'    various   domains  a;      situati(jn   specific    in  that    thiy  art^ 
fun.  '  ''mally   iuch  p^'udrut   St'ts  of   skiK..   and    that   cultur,il    b.irk  '.rcuuid  may 
influence^  differtMitiy  the   act|uisitL.M  of   each   set   of   skills.     The  fourth 
f     Lirt'  addresses   th'    ^ssue  of  which  domains  ^^rc-d  by   the  various 

do  1  in i f 1 ons .     Keschly  reports   that  'nearly  all    i n  ,  lude  the  notions  of  self- 
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Almost  all  adaptive  behavior  measures  employ  either  a  parent,  guardian, 
or  teacher  as  respondents   (sec  Table  I'A).     Only  one  measure.  Children's 
Adaptive  Behavior  Scale  (RichmoU  and  Kickl ighr.er ,   1981),   is  designed  as  a  ^. 
paper  and  pencil   test  for  collecting  data"  from  children  and  one,    the  Vineland  / 
(Doll,   1965),  may  be  used  for  interviewing  the  client  if  the  assessor  is 
extensively  trained.     As  reported  in  Table  7.1,  most^of  the  measures  of 
adaptive  behavior  range  in  the  time'  they  take  to  administer  from  20  minutes 

to  one  hour .  ,  ' 

Given  the  recency  of  the  restructured  concept  of  adaptive  behavior  and 
measures  conceived   to  reflect  this  concept,   it  is  easy  to  understand  the 
limited  availability  of  vai    lity  -evidence.     l^ile  some  of  the  measures  have  adequate 
construct    validity,   there  is  very  little  research  reporting  on  the  outcome 
validity  of  the  measures.     There  has  been  some  research  reporting  on  the 
technical   test  bias  of  these  instruments  and  the  little  research* that  has 
been  conducted  on  outcome  validity  has  focused  on -outcome  bias.  Reviewed 
below  are  two  popular  measures  of  adaptive  behavior,  measures  we  judge  to 
/--•be  the  most  psychometrically  sound  for  use  in  special  education  decision- 
making .  .  •  . 
AAMD  Adaptive-  Behavior  Scale  -  School  Edition  (ABS-SE) 

The  ABS-SE  (Lambert,   1981)  and  its  predecessor,   the  Adaptive  Behavior 
Scale  -  Public  School  Version  (ABS-PSV;  Lambert,  WindmiUer  and  Cole,  1975:^ 
were  designed  from  the  original  AAMD  adaptive  behavior  scale  (i.e.  Adaptive 
Behavior  Scale  -  ClinicaL  Version;  Nihira,  Foster,   Shelhaas  and  Leland,  1969) 
developed  at  Parsons  State  Hospital  and  Training  Center  in^^ansas  with  the 
.support  of  NIMH..    This  original  scale  and  its  1974  revision  were  designed  for 
instructional  planning 'for  the  severely  retarded .     The  ABS-PSV  was  an  offshoot^ 
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of  the  ABS-CV  and  developed  to  help  in  both  the  ins tructi.onal  planning  and 

classification/placement  of  children  in  special  education  classes.  The 

ABS-PSV  retained  most  of  the  items  f rom- the  ABS-CV,   eliminating  those  that 

could  nx)t  easily  be  answered  by  teachers  who  are  typically  used  as  third 

party  respondents.     The  revised  ABS-SE  was  developed  in  ^*response  to  the 

need  of  persons  working  in  the  field  who  have  asked  that  the  procedures  be 

revised  and  that  the  reference-group  ndrms  be  expanded  to  cover  a  wider  age 

range"  (Lambert,   1981,   p. 3).     The  ABS-SE  was  designed  for  use  with  children 

and  youths  ages  3  through  17.     The  items  on  the  ABS-SE  are  the  same  as  those 

i 

on  the  ABS-PSV.  '  -  - 

One  of  the  major  revisions  to  the  ABS-PSV  is  in  the  interpretive  scheme. 
The  ABS-PSV  consists  of  two  ^sections',    the  first  reporting  on  the  adaptive 
functioning  in  nine  skill  areas  or  domains,   the  second,   on  maladaptive 
bphavior  in   12  domains.     The  21  domain  raw  scores  are  converted  to  percentile 
ranks  and  then  compared  with  those  in  regular  and  special  education  classes. 
It  is  suggested  in  the  manual   (Lambert  et  al.,   1975)    tha^  those  children  ^ 
who  perform  similar  or  worse  than  75%  of  children  classified  as  EMR,  for 
example,  on  many  of  the  domains,   can  be  Comfortably  identified  EMR. 

The  ABS-SE  contains  the  same  items  but  provides  a  refined  scheme  for 
interpretation.     For  diagnostic  purposes,  a  diagnostic  profile  of  the  child's 
performance  on  five  factors  is  computed.     These  five  factors  labeled  Personal 
Self-Suf f iciency ,  Community  Self-Sufficiency,  Personal-Social  Responsibility, 
Social  Adjustment,  and  Personnal  Adjustment  were  derived  from  factor  analytic 
^'Studies  (Nihira,   1969a;  Nihira,^  1969b;  Guarnaccia,   1976;  Lambert  &  Nicoll,  1976) 
Factor  scaled  scores  are  compared  with  the  norm  groups  of  interest.     If  the 
child  is  being  considered  for  EMR  diagnosis,  for  example,  factor  «caled  scores 
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are  compared  with  EMR  and  regular  class  children.     Factor  scaled  scores  that 
arc  than  one  s eandard/,dcv ia tion  below  the  mean  and  considered  d iagnosf:i<a 

significant.  ^ 

In  addition  to  the  factor  scaled  scores,  a  Comparison  Score  composed  of 
the  weighted  scores  on  three'  factors  ,,  Personal  Self -Suf  f  ic iency ,  Community 
Self-Sufficiency  and  Personal-Social  Responsibility,   is  computed  to  a^d  in 
classification  and  placement.     It  is  suggested  that  a  child  p^£orming  in  the 
bottom  5  percent  of  the  Regular  group  signifies  the  possibility  of  mental 
retarda'tion  (Larabert,  1981). 

ThQ  standardization  of  the  ABS-SE  is  on  Children  and  youth  in  regular 
EMR,  and  TMR  classes.     A  total  sample  of  6,523  children'and  youth *wcre  used 
from  California  and  Florida.'    Several  ^  studies  are  reported  in  'the  manual 
offering  evidence         the  validity  of  .the  ABS-SE.     Studies  on  the  content, 
construct,   predictive,  'and  outcome  validities  of  the  Domain,  Factor  and 
Comparison  Jcores.are  offered.     Several  overall  conclusions  can  be  reached^ 
regarding  the  validity  of  the  ABS-SE .     First,   the  internal  colistruct  validity 
appears  adequate.     The  ABS-SE  is.  organized  around  empirically  structured ^ 
clusters  of  .items  that  were  thoughtfully  selected  and  analyzed.  Second, 
ac^equatc  evidence  is  offered  concerning  the  external  construct  validity"  of  ^ 
the  ABS-SE.     Evidence  indicates  that  the  various  scores  deriyecf  from  Section  I 
of  the  ABS-SE  have,  as  expected,   low  to  moderate  correlations  with  IQ.  Both' 
Section  I  and  II  factor  scores  correlate  moderately  with  standardized  tests 
of  achievement.     Third,  outcome  V'-.lidity  of  the  test  is  offered  to  help  in  t^e 
selection  of  students.     Domain  scores  for  both  Sections  I  and  II  of  the  ABS-SE 
discriminate  between  those  placed  in  regular,  EMR,  and  TMR  classes  although 
Section  I  domain  scores  appear  to  discriminate  feetter  than  Section  IlAscores. 
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Comparison  score  s,  •  doMvt-d  for  the  moat  part  from  items  inc  ludod  ^iri  Section  I 
fnctors,  show  ccJnsIderabl o  accuracy  in  identifying  the  olxtent  to  which  a  child's 
^performance  is  1  ike  '  students  in  rogulrir,  F.MK,  or  TMR  programs.  ,  , 

The  internal  consistency  of  the  items  making  up  each  of  the  factors  is 
offered  Vs^idence  of  the  reliability  of  the  ADS-fiE .     Overall,   these  reliabilLty 
coofficionts  arc  high  willr  three  of  the  factors  (i.e.   Community  Self-Suf f iciencyi 
Personal-Hocial  Responsibility  and  Social  Adjustment)  ,  suf f ic iently  h igh  to 
use  in  interpreting  individual  profiles.     Personal  Sel.f--Suf f iency  and  Personal 

I 

'diustniont  reliability  coeWicients  are  too  low  to  recommend  for  individual 
profi,lc  interpre.tations.*    In  addition  to  information  on  the  internal  Qonsistency 
of  the  data,   standard  error  of  measuremfcnt  information  is  provided  in  the  manual 

to  help  in  interpri?tations.  .  , 

)  .  ^  t 

Evidence  offered   in  the  ABS-SE  mdn'ual  indicates   that  neither  ethnic  status, 
nor  sex  is  associated  with  performance  on  the 'domains  in  Section  !  of  the  •. 
ABS-Si:.''   Also,  no  i^an  score*^i f f erences  were  found  on  Section  1  among  ethnic 
classes  and  .between  sexes  at  each  of  the  three  levelsof  classification  (i.e. 
regular,   EMR,  and  TMR)       (Cole,   1976).     Ethnic  status  and  sex  did  significantly 
-contribute  to  Sectipn  'I  performance  on  the.ABS-SE.,   In  the  Cole  (1976)  study 
of  mean  differences^' among  ethnic  groups  within  c  lassif  ication  shoxved  differences 
Zong  etmiic  groups  but  the  effects  of"  this  variable  explained  only  1  to  2  percent 
of. the  variance  in  performance.     Consequently,  as  a  function  of  there  being  no 
substantive  mean  differences  acfoss  groups  or  sexes,   there  are  good  chances  ^ 
that  the  test  is  unbiased  in  that  it  is  measuring  the  same  corfstruct  equally  well 
for  all  and. will  predict  equally  well"  for  all.     However,   this  is  only  an  infer- 
enco  and  empirical  ev idence' should  be  gathered  in  its  support.  , 

Mastenbrook    (.1  5  7  7  •) c  r  i  t  i  c  i      s    the  "content   of    the  ABS-PSV 
which    is    the    same    in    fhe   fBS-SE    in    that   it  emphasizes 

«  "  ....  '  '  ,  V  '  • 
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Ro).'f-maint(..r..:inc.b  type  behaviors  with  little  regard  for  behaviors  concernin}^ 
socir.l  rol(.s  outside  the  school.     This  criticism  along  with  the  limitations 
in  the  standardization  of  the  instrumenl    (i.e., only  two  Sutcs  arb  represented) 
need  to  be  taken  into  account    'hen  copsidcring  the  use  of  the  ABS-SE. 
Adap'tivo  Behavior  Inventory  for  Children  (ABIC) 

The  ABIC  was  developed  as  part  of  the'  comprehensive  System  of  Multi-, 
cultural  Pluralistic  Assessment' ( SOMPA;' Mercer ,  1979).     However,  the  ABIC  can 
be  administered  and  scored  separately.     Consistent  with  the  purpose  of  the 
SOMPA,   l^he  ABIC  was  designed  for  the  major  purpose  of  classification/placement. 
The  measure  has  norms  fpr  children  five  through  1 1  inc lus ive' and  in  this 
sense  is  more  limited,  than  the  ABS-SE.     It  employs  parents  as  third  party 
respondents,   takes  approximately  one. hour  to  administer,  and  can  be  administered 
by  paraprofessionals  with  training  (see  Table  ,7.7).     Its  major  advantage  in 
comparison  to  other  measures  is  its  assessment  of  behaviors -across  a  variety 
of  settings.     Other  measures  do  not;  specif ically  address  the  child's  function- 
ing  in  dif f erent  environmen ts  as  well,  as  the  ABIC  (see.  Table  7,7).  ^ 

The  items  for  the  ABIC  were  derived  from  a  conceptualization  of  adaptive 
behavior  that  conceived  it  as  "an  adaptive  fit  in  social  systems  through  the 
developr.ent  of  interpersonal  .t,ies  and  the  acquisition  of  specif ic  skills  ^ 

required  to  fulfill  the  task  functions  associated  with  parfe-icular  roles 

,   /         •  * 

(Mercer,   1979,  p.   93).     The  six  scales  include  family,  communi^^y,  pe'er,  non-~ 

I'. 

academic  school ,  earnfer/consumer ,  and  'self-nnain,tenance .     Performance  in  school, 
while  still  conceived  as  ^a  component  of  adaptive  behavior,   is  measured  by 
performance  on  the  WISC-R  in  Mercer's  system.     The  ABIC  consists,  of  a  total 
242  items,  reduced  from  an  initial   item  pool  of  480  items.     The  choice  of  ^ 
'items  was  heavily  based  on  intensive  interviews  with  mothers  , and  items  chpsen 
■  for  the  scales  were  identified  thiough  an  analytic  sorting  procedure. 
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Special  attention  was  given  to  the  choice  of  items  that  w,ere  believed  to 
be-  less  likely  to  show  differences  across  race  and  pex. 

The  norms  for  the  AHIc  are  based  on  a  stratified  random  sample  of  2085 
California  children  between  the  ages  of  3  and  11.     The?  sample  was  stratified 
according  to  e thnic /rac ial  group,  sex,  age,   and  size  of  community.     Raw  scores, 
derived  from  the  six  scales  are  converted  to  scaled-scores  with  a  mean  of  50 
and  a  standard-  deviation  of  15.     Standard  error  of  measurem.iiit" -information 
is  provided  so  that  probability  statements  can  be  made  regsvdiitg  >  the  range 
within  fwhich  the  child'fe  true  score  lies. 

The  split-half  reliabilities  of  the  .various  scales,  ages,  and  ethnic/ 
racial  groups  is  provided.     They  range  ftom   .7'8  to  ,92  with  a  median  reli-' 
ability  coefficient  of   .86.  * 

The  relationship  between,  the  WISC-R  and  the  ABIC  as  reported.,  are  low' 
ranging  from  near  zero  to  .3  (  Kasiimour  &  RescJy,  ^1980;  Mercer,  1979; 
Teb^leff  &  Oakland,   1^77).     Similar  correlations  between  the  ABIC  and 
mensures  of  academic  achievement  are  reported  (Sapp,  Morton,  McElroy  & 
Ray,   1979;  Tebelef f "Oakland,   1977).     In  a  comparison  of  scores  ou  ♦ihe  ABiC 
across  racial/ethnic  groups  (i.e..  White,  Black  and  Hispanic),  Mercer  .(1979) 
reports  that  the  signif icant  d if f erences  that  »":re  found  were  too  small  to 
*  'be  of  any  practical  signif i'cance  in  interpreting  scores  for  individu*. 
children.     Grindby  and  Mastenbrook  (1977  )  report  scoreq  for  lower  'Income 
Mexican-American  children  lower  than  other  groups,     Sattler  (1982)  crit  ^? '/'A 
'   the  'ABIc'  lor  not  providing  the  opportunity*  for  an  assessment  of  whether  or^ 
not  such  difference?  are  a  function  of  decreased  opportunities  in  the  child's 
environment  rather  than  a  child's  lacl'  of'abiLity.    .Sattl,er  (19S2)  "proviJes  • 
three  additional  criticisms  of  the  ABIC,     First,   it  relies  excl^t^sively  on  the 
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quas tionablo  respoi^es  of  pnrcnts  or  ^ardians.     Second,   some  of  th^i  items  . 
may  discriminate  against  low  SES  und  minority  p.roup  children.     Third,  the 
norms  for  the  ABIC  may  not  be  adequate  for  use  outside  of  California, 
liucklo.y  and  Oakland  (  1977),  for  example,  report  lower  scores  for  Texas 
children  than  California  childrert. 

'  If  should  be  noted  that  Mercer  (1979)  argues  that  the  validity  of  the 

use  of  the  ABIC  should  not  be  based  on  itii  relationship  to  academic  achieve-- 

ment  or  \Q\     Rather,   the  purpose        the  ABIC  is  to  assess  the  extent  to  which 

the  child  is  meeting  expectations  within'  the  social  systems  hd/ she  is 

functioning.     ConsJffquen tly ,  Mercer  argues  that  the  predictive  utility  of  the 

f* 

ABIC  should  be  judged  against  such  criteria.     However,  as  Oakl^d  (1979) 
points  out,  since  these  criteria  are-  not  available ,.  the  pred ic tive  validity 
of  the  ABIC  according  to  Mercer's  definition  remain.*?  unknown. 
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Eiiaqet  ian^;g^QsyQont  >.ErQceduros 
Although  there  has  been  a  noticeable  lack  of  reliance  < 
standardized  intelligence  test^'  in  the  development  of 
Piaget's  theory  (Brainerd^  1978) ,   the  clinical  methods 
derived  f rom  .  th is  area  are  beinq  ,ijsed  as  an  alternative  to 
traditional  .procedures.     Piaget  conceptualized  human 
development  as  occur r rng  in  a  series  of  invariant  stages, 
each  stage  a  discrete  set  of  mental  operations  is  purported 
to  be  us^d   in  organ i zi nq  exper ience  and  adapting  to  the 
environment.     Although  Piaget  assumed  that  the  way  in  which 
exp'^rience  was  organized  is  genetically  determined,  the  • 
environment  is  said  to  influence  the  rate  of  developmental 
progtess  (Flaveia,  1963); 

'Piaget  outlined  four  main  stages  of  cognitive  or 
intellectual  development.     These  four  main  stSqe^  are 
outlined   in  Table  7.2.     The  manner  in  which  the  child  is 
assessed  and  the  responses  scored  is  usgally  different  from 
more  ,  trad  i  tional  te.sting.     This  perspective  is  reflected  in 
the  statements  by  Elkind  (1974): 

.•.when,  .we  deal'  with  (the  child's  thinking 
processes)   we  must  not  evaluate  them  as  right  or 
wrong  but  rather  value  them  as  genu ine . express  ions 


Table  7.2 
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I'la^et's  fit.iRes  of  coRnilive  developiijont 
Approximate 

I.  Scnborlmolor         Birth  to 


Primary/  fvaturcs,  especially  toward  the  end 
of  each  stage 


]].  Trcopcrationnl 


2  to  7* 
years 


III.  Concrete 
Operations 


Formal 
Operations 


"Thought"  occurs  primarily  through 
2  years         '     actions.  • 

Coordination  of  sensory  Input  lmpro>^. 
V  Coordination  of  physical  responses 

Improves. 

Objects  and  people,  including  sblf,  Are 
^differentiated  from  one  another  and 
•  tecopnized  aft  permanent. 
Languagduse  and  symbolic  thought 

Increase.  »  .  . 

Egocentrism  predominates, 
Centratlon  (attending  to  a  striking  feature 
or  part)  rathcy  than  decentratlon 
(analysis  of  whole  and  parts)  characterizes 
perceptioii' and  thought 
Produces  mental  Images  of  static  situations 
and  things,  ralher  than  of  processes*  and 
transformatlori*. 
Irreversibility  in  tjiought  (can  think  in  one 
w^y  but  not  its  ^reverse;  e.g.,  counting, 
saying  letters  of  the  alphabet). 
Perceptibly  similar  objects  are  classified  as 
alike. 

Words*  (names)  are  associated  with  some 
things  and  with  some  classes  of  fhlngs. 
Logical  thinking  using  concrete  objects 
occurs,  ^ 

Less  egocentric  and  more^socialized  speech 
*  occurs. 

Conservation  mcreasihgly  occurs. 
Decentering  and  reversibility  occur. 
Understands  changes  and  processes  and 

more  complex  static  events  and' relations. 
The  stfWfe  things  are  grouped  correctly  Into 

iwo  or  more  different  classes. 
Relations  among  actual  things  and  classes 
of  things  are  understood;  also  relations 
*  among  words  that  represent  things  and 

classes  of  things  that  have  been 
experienced  arc  understood. 
11  years         Mental  operations  in  symbolic  form  are 
to  adult        '     carried  out  and  operations  are  performed 
on  ideas  as  things. 
C6mparisons,  contrasts,  deductions,  and 
inferencc^'from  ideational  content  rather 
than  con^l^^ete  things  and  everfts.^ 
Relations  between  and  among  symbols 
standing  for  concepts  that  have  not  been 
experienced  directly  are  understood. 


7  toll 
years 


Source:  Adapted  from  Ginsburg  and  Opper,  1969.    .  .. 

Source:  Klausmei'er,  \l'.J.,  &  Goodwin,  W.  LearrtinR.and  human- 
abilities:  Educational,  psychology.  New'York:  Harper  5  Row,  1971.. 
(adapted  frbm  Ginsburg^nd  Opper,;.  1969).  BEST  CGPV  Al'illUEtt 
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of  the  child's  budding  mental  nbilities.  When  wo 
deal  with  spatial ,  temporal,  rausal,  or 

4 

'quantitative  concepts,  we  need  to  explore  the  kinds 

of  meahings  children  give  to  such  terms.  Such 

\      exploration  reveals  the  level  and  reference  frame 

of  the  child's  understanding,,    More  importantly, 

such  explorotiolh  avoids  the  inhibiting  suggestion 

that  the  child's  incomplete  (but  partially  correct) 
*  « 
understanding  of  such  terms  is  "wrong  "•     A  teacher 

who  sees  a  child's  productions  as  having  value,  as 

^      meaning  something,  avoids  putting  the  child  on  the 

tract  of  always  seeking  "right"  answers.  More 

importantly,  perhaps,  her  or lentat ion  conveys  to 

the  child  a  sense  of  her  attempt  to ' understand  him  ^ 

and  her  respect  for  .ler"  intellectual  productions 

(p.   125)  . 

'    ,    An^  importafit  aspect  of  Piaget^ian  assessment  ,is  that  the, 
course  of  cpgnitiye  development  is  said  to  be  invariant  in 
Sequence.     In  this  regard  a  chil9  can  be  in  one  of ,  the  four  ^ 
'stages  (or-yin  a  transitional  stage)    in  which  he/she  performs 


"pi 


L 

tasks  within  a  given  stage.'    The  child  cannot  miss  a  stage;ap 
the. usual  sense  because  var ious* cogni t ive  structures  or  ^ 
Schemes  serve  as  the  basis  for  al  1  normal  'development .  Thus, 
the  sequential  jnental  development  is  said  jto  occur,  in- all 
children  regar^less'^of  rdce  or  social  class.    ,  In  this  regard, ^ 
the  Piagetian  tasks  may  be  less  culture  load^ed  th$in 
conventional  10  measures  (Jensen,  1980).      /  ^  •      »  ^ 
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Some  writers  have  noted  that  the  Piagetian  tasks  are  less 
susceptible  to  influence  by  specific  instruction  than  the 
usual   IQ  measures  (e.g.,   Kohlberq 1968;   Sigel  &  Olmsted, 
1970).     Come  contrasts  between  the  Piagetian  and  more 
traditional  psychometricr  approaches  to  intel'l  iqence  are 
presented  in  Table  7.3.     As  noted  by  Elkind   (1974),  the  two 

-\  ,  •  ' 

'approaches  differ  on  the  following,  dimensions:     The  type  of 

genetic  causal i ty which  they  presuppose,   the  conceptions  of  . 

the  course  of  mental  development,  and  the  relative  :% 

contributions  of  nature  and  nurture  to  intellectual  skills. 

There*have  been  relatively  'few  'Piagetiah  assessment  , 
measures  developed  for  use  in  applied  settings.  However,, 
Stru'thers  and  DeAv'ila   (1967)   developed  the  Cartoon 
Conservation  Scales  which  is  a  test  for  children  that  can  be 
administered  on  a  group  basis.     The^  test  s^ems  to  be 
appropriate  as"  a  measure  of  cognitive  development  with  respect 
to  certai^  aspects  of  the  Piagetian  conseifewation  co.ncept.^ 
This  assessment  approach  may  prove  valuable  in  that  there  , 
appears  to  be  a  similarity  in  cognitive  development  of 
children  from  divers^  cultural  ba.-kground  when  assessed  on 
certain  Piagetian  tasks  (DeAvila  &  Harassy,  1975). 

A  number  of  Piagetian  assessment  procedures  are. reviewed 
by  Johnson, (1976).     However,  many  of  these  tap  specific  skills 
(e.g.,-  conservation  of  number,   S>wanson,  1976a)   and  represent 
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ComparUon  of  Plagotlan  and  Psychomptfic  Approachot  to  intoiilgonco 


S/rrn/or/fios 


\.  Doth  accnpi  tjcnotic 

dotormincints  of  inlolltyonco, 

2.  Dothcicccp!  motufohonal 
dotorminofton  of  intelliC)*3nco. 

3.  Both  use  nonoxDonrTionlal 
mothodolofjy. 

4.  '  Both  ottcmpt  to  moosuro 

intcllcctuol  functions  thot 
the  ctiild  is  expoctod  to 
hovo  developed  by  o  certoin 
oge. 

5.  Both  concotvo  of  inleMiqonce 
OS  being  cssontiolly  rotjonol. 

6.  Both  os:;ume  ♦hat  maturahon 
of  intellectuol  processes  is 
complete  somewhere  durmg 
lote  odolescence.  ^ 

7.  Both  oVe  capoble  of  prenWing 
intellectuol  behovior  outside 
of  the  test  sttuonon. 


.  Differences 


.  Pitjqefion 


Psythnmcfric 


Assumos  ♦htit  thoro  or**  fcictofs 
which  givf?  devolopn)eMt  o 
dofinilo,  nonranoom  direc- 
tion. AAontol  O'owth  is  quolito- 
tive  ond  presupposes  stgnifi- 
cont  differences  in  the  think- 
ing of  younger  versus  oldor 
children;  concerned  with  intro- 
individual  changes  occurring 
in  the  course  of  development. 
Views  monlol  growth  os  ihe' 
forpnotion  of  nnw  mon'ol 
structures  ond  the  emergence 
of  now  mentol  abilities. 


3.  Genetic  ond  environmental 
foctors  interr  •  In  o  functionol 
and  dynom      lanner  with 
respect  to  if  uir  rcgulotdry 
control  over  mentol  oclivity. 


2. 


Tf!Slo(.i  intollK^encG  is  jj-^'-uinud 
to  bo  randomly  (Jistrihutod  in 
o  given  populolion,  with  the 
distribution  following  the 
normol  curve;  concornrd  wMh 
Intor-lndividual  differences. 


Views  the  courr.  ^'n\ 
growth  as  o  curv    v.  ' 
meosuros  the  omc 
intelligence  ol  some  criterion 
oge  thot  con  be  predicted 
from  ony. preceding  oge. 
Genetic  ond  en^iror>mentol 
contributions  to  iritelligence 
con  be  moosured. 


Note.  Sim.lof  .ly  .,omi  5,  fr.  ond  7  obtomed  from  Oodek,  LeJIer,  Goldberg,  and  Dyor  (1969);  Ihc  remainder  o(  Iho  mble 
adapicd  from  Elkind  (1974). 


Source:  Sattlcr,  J.M,  Assessment  of  children's  intelligence  and  special 
abilities  (2nd.  ed.)  Boston:  Allyn  and  Bacon,  1982, 
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/"exper  iemental"  or  "research"   fnstrii^ents  at  this  time.  Thus, 
their  usefulness  in  nonbiesed  assessment  remains  unknown. 
However r  within  the -context  of  a  broadened  scope  of 
assessment ,   these  various  devices  may  bd  usef^ul   in  the 
assessment  of  certain  specific  skills  (Johnson^   1976).  One 
commercially  produced  set  of  Piagetian  assessment  procedures 
is  the  Concept  Assessment  Ki  t,/(CAK)    (Goldschmidt  &  Be. tier, 
1968a).     The.  CAK  is   for  use/in   individual  assessment  of 
child'ren  in  such  areas  as  conservation  of  ^number-,  substance, 
weight,  two-dimensional  space,  and  continuous  and 
discontinuous  qualities.     The  test  has  been  reviewed  in  Euros 
7:  43  7,   the  Jouroa\^bf  ^  Educational -.Measurement ,   196  9^   &,  ^ 
263-269,   and  briefly  by  Jensen   (1980) .     Generally,   the  CAK  has 
somewhat  ,  1 imi ted  norms   (i^e.,   560  children  in  the  Los  Angeles 
area)    and  a  somewhat  1  imi^^ed  age^  range.  .  ^  • 

One  of  the  'more  extensive  reviews  of  Piagetian  assessment 
and  the  ^issues  surrounding  these  measures  in  test  bias  was 
presented  by  'Jensen   (1980) .     He  noted  that  Piagetian 
procedures  show  promise  as  culture  reduced  measures;  but  he 
raised  two   important  questions  regarding  these  measures:  "Do  - 
Piaget's  tests  measure- a  different  mental  ability  than  the  ^ 
measured  by  conventional   IQ  tests? ^  and    (2)   do  minority 
children  and    (culturally  disadvantaged)    chi Idren  per  form 
better  on  the  Piagetian  tests,   relative  to  majority- children , 
than  on  conventional  IQ  tests?"   (p.  673)  .   In  response  to  the 
first » question  *  Jensen   (1980)    notes  that  the  correlations 
between  various  intelligence  and  achievement  , tests  and  various 
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Piaqetian  measures  of  from  5  to  10  scales  assessing  concrete 
operations  range  from   .18  to   .84    (X=.50;.see  Table  7.4).  In 
the  case  of   the  Garf inkle   (1975)    study,   Jensen  (1980) 
conducted  a  principal  components  .analysis  of  the  intercorre- 
lations  among  the  14  Piagetian  tasks.     He  found  that  the 
squar.ed  multiple  correlation  of  each  item  with  every  other 
item  is  comparable  ^to  that  found  on  the  Wechsler  subtests. 

Jensen  :(i980)  "also  r-eports  that   Piaqetian  tests  show 
.social  cl^ss!  and  ethnic  group  differences  in  the  united  States 
with  children  from  low  SES  backgrounds  about  as  far  behind  in 
the  Piaqetian  measure  as  for  more,  trad itional  lO  tests  (e.g., 
Almy,   1970;   Almy,  Chittenden,  .  Mi  Her,,  "  1966 Figurell  i  & 
Keller,   1972 ;  :  Tdddehham ,   1970;  Wasiks.  Wasik,   1971).  Jensen 

(1980)    notes:   '  / 

in  all  such  comparisons  of  gr.oup  measures,  one  must 
take  into  account  the  small  number  of  im.s  of  the  » 
•  .      '  Piagetian  tests,  which  tends,  greatly  to  attenuate 
.mean  differences  expressed-  in      units  or  standard  • 
score  units.     When  this  is_ properly  taken  into  . 
account,   in  terms  of  item  discriminabilities  and 
.  inter-item  correlations,   it  turns  out  that  the  » 
.'  Piagetian  tests  show  larger  white-black  differences 
than'.the  Stanf ord-fei net  or  other  conventional  10  < 
tests.     ,1  fig-Ore  the  white-black  mean  difference  in  . 
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Table  7.4 


Table  7,4  Correlation  (r)  between  Piagciian  tests  and  various  measurements  of  intel- 
ligcncp  and  scholastic  achievement. 


vanaDlc 

r 

Study 

Intelligence  Tests 

.    Beard,  1960 

Slanfprd-Binel  MA 

.38  • 

Wise  MA 

.69 

Kuhn.  1976 

Wise  Full  Scale .IQ 

.43 

Elkind.  1961 

Wise  Verbal  IQ 

.47 

Eklind,  1961 

Wise 

.69-. 84  , 

Hathaway,  1972 

Raven's  Matrices 

.60 

Tuddenham,  19^0 

Peabody  Picture  Vocabulary  Test 

.21 

Tuddcnham,  1970 

Peabody  Picture  Vocabulary  Test 

.47 

Qaudia,  1972 

Peabody  Picture  Vocabulary  Test- 

.28 

De  Avila  &  Havassy,  1974 

Peabody  Picture  Vocabulary  Test 

.31 

Klippel.  1975 

Lorge-Tliomdike  MA 

.62 

Kaufman.  1970,  1971 

Lorge-Thorndike  IQ 

.55 

Kaufman.  1970.  1971 

Gesel!  School  Readiness  Test 

.64 

*    Kaufmann.  1970.  1971 

IQ-Unspecified  Test 

.24-.34 

Dodwell.  1962 

Mean  r 

.  .49 

Scfiolastic  Achievement 

Reading  (SAT) 

.58 

,  Kaufmann  &  Kaufmann.  1972 

Reading  (SAT) 

.42 

Garfinkle.  1975 

Arithmetic  (SAT) 

.50 

Garfinkle.  1975 

Arithmetic  (SAT) 

.60 

Kaufman  &  Kaufman.  1912 

Mathematics  (MAT) 

.18-.41 

Dc  Vries.  1974. 

Arithmetic  Grades 

.52 

Goldschniidt.  1967 

J.  Composite  Achievement  (SAT) 

-  .64 

Kaufman  &  Kaufman.  1972 

eon>posiie  Achievement 

(ealifornia  Achievement  Test) 

.63 

Dudek  et  al..  1969 

Mean  r 

.55 

urce:  Jensen,  A.R,  Bias  in  mental  testing.  New  York;  The  Free 
Press,  1980. 
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units  would  be  about  20  percent  larqsr  than  the 
Starif ord-Binet  10  difference  on  Piagetian  tests  of  , 
comparable  length  to  the  Stanf ord-Bi net ,  6ut^ 
while  Piagetian  tests  tend  to  magnify  the 
white- black  difference ,   they  tend  to  diminish  the 
differences  between  whites  and  Mexicans  and 
Indians,  and  Oriental-s  tend  to  surpass  whites  in 
Piagetiart  performance.     Interestingly^  Artie 
Eskimos  surpass vwhite  urban  Canadian  children  on 
Piagetian  tests^   and  Canadian   Indians  do  almost  as 
well  as  Eskimos   (p.  676). 

LearpiDg-«°.RQteDtaaI  ^Assessmeat  , 

^enerally^   the  learning-potential  approach  views 
assessment,  as  an  examination  of  learning  and  strategies  which 
"facilitate  acquisition  of  new  information  or  skills  (cf.' 
Kratochwillf   1977).     Learning-potential  assessment  bears 
similarity  to^  Piaget's  work  on  intel lectual'  development 
( H  a  ywaod  r   Fill  e  r  r~Sh  i  f m^a  n  V  &  "Chatel^ant ,   1974)  .  "That  is  ,    I  ' 
within  the  Piagetian  paradigm,,  intelligence  is  viewed  as  a 
process  rather  than  a  static  entity  unmodifiable  by 
exper  ience .'  .  * 

Work  in  the  learning  potential  area  has  been  af fi 1 ia ted 
v/ith  Haywood  and  his  associates  in  Nashville^  Tennessee; 
Budoff  in  Cambridge,  Massachusetts;  and  Feurstein  and  his 
associates  in  Jerusalem.     These  investigators  and  their 
collegaues  have,  adapted  test-based  models  for  assessment  and 
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intervention  of  the  mentally  retarded  and/or  learning 
d  isabled'  (Haywood  et  al.,   1974  and  I^r atochwi  1 1  ^   1977,  for 
overviews).     Haywood  et  al.    (1974)    noted  that  verbal 
abstraction  abilities  can  be  improved  in  mental  retardation", 
associated  wi th  cultural ly  d i fferent  environments  during 
^actual  as'sessment.     For  example,  ^ome  research 'indicates  that 
mental ly  retarded- cl ients  are  able  to  perform  better  on  \ 
Wechsler!s  Similarities  subtest  when  examples  of  ea\h  concept 
are  provided    (Gordon  &  Haywood,   1969).     These  results 
apparently  replicate  with  retarded  children  and  adults  from 
culturally  different  environments   (Haywood  &  Switzky,  1974).* 
The  learning-potential  work  of  Budoff  and  his  assopiates 
has  used  a  test-trai n-retest  assessment  paradigm  on  such 
instruments  as  the  Kohs'  Block  Design  Test  (Budoff,^  1967), 
Wechsler  Performance  Scale  (Budoff,   1969),  Raven's 
Progressive  Ma  trices  (Bud  of f  &  Hutton,  1972),  and  a 
modification  of  Feu.rstein*s   (1968) 'early  Learning  Potential 
Assessment  Device  (Budoff ,   1969).     These  tasks  are 


sensitive  to  "modification  via  instruction  or  coaching  and 
typically  assessment  can  yield  three  types  of  performers. 
High-scorers  gain  little  from  coaching.     Thbse  who  initially 
score  low  and  demonstrate  performance,  gains  following 
instruction  are^  labeled  gainers.     Mongaioers  initially  scores 
low  but  do  not  show  gains  following  training   (see  Budoff, 
Meskin,  &  Harrison,   1971).     A  major  implication  of.  Budoff*s 
work  has  been  that: 

A  large  proportion  of  IQ-defined  retardates ,  who 
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come  from  low  income  homes  and  have  no  history  of 
brain  injury,  show  marked  ability  to  solve  these  ' 
tasks  when  they  are  presented  in  the  learning 
potential  assessment  format.     The  data  indicate 
that  the  more  able  students  by  this  criterion  \are 
educationally,  not  mentally,  retarded,  and  the 
ability  they  demonstrate  prior  to^  or  following-  r 
t'jition  is  not  specific  to  the  particular  learning 
•  potential  task  (Budoff,   1972,  p.  203). 
There  is  some  empirical  work  on  the  learning  pot^tial 
strategy.     For  example,   Budoff  ^d  Hutt'^on   (X972)    found  that 
if  they ^provided  on^V  an  hour  of  structured  experiences  m 
problem  solving  to  children  who  initially  scored  low  on  the 
Raven's,   50  percent  of  these  low  performance  children  scored 
at   the  50th  percentile   (or  above)   on  a  posttest  admihistere\3 
after  training.     These  gainers  represented  minority  groups. 
Similar  results  have  been  found  with  "learning  disabled 
children'    (e.g,,   Piatt,   1976 ;  Swanso^ ,   19       .     Sewell  and 
Severson   (1974)  'also  found  that  the  Raven  Progressive, 
Matrices  (see  Budoff  &  Fr iedman ,   1964 )  usefully 
differentiated  low  SES  black  children  who  ^ould  profit  from 
learning  experiences.     Nevertheless ,  .unanswered  questions  in 
this  area  relate  to  how  learning  potential  assessment:  yields 
prescriptive  information  for  classroom  instruction,  _ 
especially  in  various  academic  content  areas  (math,  reading, 
etq.)   and  the  gener  a^l  i  zabi  1  i  ty  of  the  training/^  (cf. 
*Kratochwill ,   1977).  • 
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Another  area  within  the  learning  potential  paradigm  is 

-^'^        ,     ■  ^  ^ ,  _ 

represented  in^the  work  of  Feurstein  and  his  associates 

(Feuerstein^   1968^   197.0;  Feuerstein  &  «and^,  1978).     Like  the 

>^  " 

<^  -  ■■ 

test-train-test  pa"ra3\igm  of  Budof^f  and  nis  associates^ 
Feuerstein' s  strategy '*is  designed  to  promote  the  best 
possible  learning  and  ipot iva tional  conditions  of  the  child. 
The  Learning  Potential  Assessment  Devijefe  (I^PAD)   is  designed 
to  assess  what  an  individual  can  learn  rather  than  the 
traditional  inventory  of  what  one  has  learned  and  current 
problem-solving  ability  (Feuerstein  &  Rand ^  1978) . 

^   A  detailed  discussicii  cf  the  LPAD  can  be  found  in 
Feuerstein^   Rand  ^   and  Hoffman   (1979)  \     The  lJ^  is  usually 
employed  for  individual  assessment^  but  a  group  version  has 
been  developed •     In  the  group  test  students  are  assessed  on 
tasks  that  become  progressively  more  difficult.-  The  ^ 
conceptual  framework  for  the  LPAD  is  as  follows: 

The  assessment  of  learning  potent ial  differs  from 
that  of  standardized  psychfemtric  techniques  in  a 
number  of  significant  ways.     The  primary  . 
differencee  lies  in  the  conceptual  foundations  upon 
which  the  assessment  is  based.     In  place  of  thje 
static  goals  generated  by  conventional  'psychometric 
theory  and  techniques  which  determine  the  nature 
and  structure  of  its  measuring  instr umeri ts /  the 
LPAD  and  its=*  theoretical  framework^  the  cognitive 
map^  generate  dynamifc  goals  which  reflect  the 
underlying  dimensions  of  the  adapt ive°  processes 
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involved  in  intelligent  activities.     Ill  terms  of 
the  actual .  techniques  employee^,  the  central  purpose 
is  again  very  different.     Tests  that  yield  IQ 
measures  are  constructed  to  provide  a  reflection  of 
an  individual's  manifest  level  of  performance 
relative  to  other   individuals  within  a 
representat  ive     normally  distributed  population. 
The  LPAD  is  "geared  toward  producing  changes  within 
the  individual  during  the  testing  situation  in 
order  to  permit  an  ongoing  assessment  of  that 
individual's  ability  to  learn  and  change  relative 
to  his/her  own  optimal  levels  (Feuersteinr  Millerr 
Rand,   &  Jensen/  1981,  pp.  202-203). 

Feuerstein  et  al..   (1980)   note  the  learning  potent ial 
assessment  requires  four  specific  conceptual  shi fts  from 
traditional  testing.  l 

1.  A  shift  from  product  to  process  orientation.  In-^ 
this  regard,  the  LPAD  is  designed  'to  alter  the 
individual's  per f ornvance  during^he  actual 
asses  sment;. 

2.  The  test  structu>:e  includes,  the  conceptual  features 
>  of  the  cognitive  map.     Figure  7.  ,1  .s.hows  the 

/structural  model  of  the  LPAD  and  Figure. 7.  2  presents 
an  example  of  the  test  instrument.  The  task  is 
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.Figure  7.1  -  The  Learning  Potential  Assessment  Device  Model 
(Source:    Feuersteinj  R.,  Miller,  R.i  Rand,  Y., 
&  Jensen,  M.R.    Can  evolving  techniques  better 
measure  cognitive  change?    The  Journal  of  Special 
Education,  1981,  J5^,  201-219.    Reproduced  by 
permission) . 
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Figure  7.2  -'Example  of  Raven's  Matrices  variations  based  on 
LPAD  Model  {Source:    Feuerstein,  R.,  Milleri  R., 
Rand,  Y.,  &  Jensen,  Ti.R.    Can  evolving  techniques 
better  measure '  cogni ti  ve  change?  ''■  The  Journal  of 
.    ^^SoecilMiuMMMV^  5  ,::v?0T|2il9  %s.ReR»fpduced 
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presented  in  a  variety  of  modalities  (e.g./ 
numerical,  vei/bal,  figural)   and  could  require  a 
number  of  different  operati-ons  (e.g.#  analogy, 
classif icablon ,  and  seriation) .     The  student  is 
given  trai/ning  to  solve  a  part icular  plobl em  and  it 
is  assumed  that  training  provide  a  -temporary 
correct/on  of  deficient  prerequis^ie  functions. 
Data  1/lke  the  following  are  obtained. 

a.  The  capacity  of  the  examinee  to  gjrasp  the 
principle  underlying  the  initial  problem  and  to 
solve  i t ;  '  . 

i 

b.  The  amount  and  nature  of  investment  required  to 
teach  the  examinee  t^he  g  iven  principle; 

c.  The  extent  to  which  the  newly  .acquired  principle 
is  successfully  applied  in  solving  problems  that 
become  progressively  more  different  from  the 
initial  task;^  *  ^ 

d.  The  differential  preference  of  the  examinee  for 
one  or  another  of  the  various  modalities  of 
presentation  of  a  g iven  problem;  and 

e.  ^he  differential  effects  of  different  training 
strateg ies  of f er ed  to  the  examinee  in  the 
remediation  of  his/her  functioning;  these 
effects  are  measured  by  using  the  criteria  of 
novelty-complexity,  language  of  presentation, 

* 

and  types  of  operation  (Feuerstein  et.ral.# 
1979). 
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3.  The  test  situation  is  changed  with  one  of  a 

tes t- teach- test  procedures .     Factors  that  prod uce 
changer  including  actively  instructing  the  student, 
feedback^  concerning  success  anc^  failure,  engaging 
the  child  in  the  learning  process,  promoting 
irttriasi^c-  motivation,  ^interpreting  performance,  and 
individualizing  the  test  situation,  are  used  during 
the  actual  assessment,  •  / 

4.  The  results  of  the  LPAD  are  interpreted  di fferently, 

J*  '  ■  ■ 

For  example,  considerable  weight  is  given  to  V 
excellent  responses , "and  unexpected* responses  are 
pursued*  ( i  •e  • ,  a  process  approach  is  taken). 
More  recently,  Arbi tman-Smi th  anc^  Haywood  (1980) 
described  an  educational,  program  that  is  used  to  enhance  the 
learning  skills  of  slow  learners  and  learning  disabled  r 
students •     The,  approach ,  developed  by  Feuerst^in  and  his 
colleagues  (Feuerstein,   1979;  Feuerstein,  Rand,,  Hoffman,  & 


Miller,   1980)    is  called   Instrumental   Enrichment  (IE), 
Ar bi tman-Sm i th  and  Haywood   (1980)  describe  the  IE  program: 
It  not  only  encourages  students  to  engage  in  a 
learning  process"  that  is  different  from  their  past  - 
experiences  and  thus  not  asspciated  with  past 
fai lures ,  but  al so  g  ives  teachers  an  Organ i  zisd  , 
structural  framework  with  which  to  teacl^  a  variety 
of  problem-sol ving' processes.     The  program 
emphasizes  processes  by  which  varjrfus  problems  are 
.  solved  rather  than  merely  getting  "right"  answers. 
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Basic  cognitive  operations  such  as  evaluation^ 
interpretation f  planning r  and  comparison  are  ^ 
consistently  taught  through  the  programl'^d  the 
students  are  made  aware  of  their*  own  thbubfht     J  ^' 
processes.  This  consists    currently  of  15  teaching 
instruments r  each  focused  upon   (but  not  1 im i ted  to)  ( 
a  specif ic  deficient  cognitive  function.     The    \  { 
.  program  consti  tutes  in  the  aggregate  approx  imately 
^50        300  hours  of  classroom  instruction^   3  to  5 
hours'^a  weeky  to  be  spread  over  at  least  two  years, 
(.p.  ^3  )  . 

More  detailed  information  on  the  content  and  goals  pf  the  IE 

program  can  be  found  in  Feuerstein  et  al .  (1980). 

The  IE  program  has  been  evaluated  in  Israel  (Feuerstein^ 

Rand^   Hoffirian^  &   Miller^   1980)   and-  in  the  United   States  and 

Canada  (Ha  ywood*  *&  Arbi  tman-Sm  ith  ^  1980).     Arbi  tman-Sm i th,  and 

Haywood    (1980)'  provided  an  overview  of  some  preliminary  data 

on  cognitive  education  of  learning  disabled  students.  The 

^  * 

authors  reported  that  data  have  been  collected  on  two 
one-year  contracts.  -  They  note  that  there  were  generally  no 
first  year  effects  on  school  achievement,  but  some  changes  in 
i ntellectual  functioning  were  found   (see  Ha ywood  & 
Arbi  tman-Smith,   1980^  for  an  overview)  .     With  their  LD 
population  Arbi tman-Sm i th  and  Haywood   (1980)   report  that 
students  "indicated  interest  and- mot ivat ion  to  learn  the  IE  ' 
program  materials  and  actively  participated  in  tixe 
discussions',  a  form  of  behavior  not  often  exhibi.ted  in  their 


Assessment  Bias 
-      •  307 

regular  classes"    (p.  62).     Unf or tunately  /  evfen  these  effects 
may  not  have  been  due  to  the  program  since  no  control  f or 
such  effects  was  included  in  the  study. 

Some  writers  have  noted  that  the  work  in  the  learning 
potential  assessment  area  appears  promising  for  the 
nondiscriminatory  or  nonbiased. assessment  (Alley  I?  Foster, 
1978?   Laosa,   1977;  Mercer  &  Ysseldyke,   1977).     Indeed,  Mercer 
and  Ysseldyke   (1977)    include  tfie  learning  potential 
assessm^ef^  paradigm  as  part  of  the  pluralistic  assessment 
"model."     Feuerstein  et  al.,(1979)   have  noted  that  the  LPAS 
"^prov^ides  a  more  Lair  assessment  of  minor^ity  students  than 
attempts  to  adp    :  conventional  psychometr ic  tests.     They  note 
that  such  tecl-*       es  as  culture-free,  culture  fair, 
developmental  tests,  and  the  SOMPA  procedures  perpetuate  a 
confusion  between  manifest  performance  and  potential  fp. 

203)  .  V 

There  is  some  empirical  research  on  the  LPAD  (see. 
•   Feuerstein  et  al . ,   1979,.  for  a  detailed  review).  Re^earc^? 
has  been  conducted  wi  th  d isadvantaged  children,  ori 
homogeneous  versus  heterogeneous  grouping,  and  on  assessment 
of  culturally  different  immigrants.     In  the  latter  group  , 
, Feuerstein  et  al .   (1981)   report  that  minimum  training 

provided  by  the  group  test  procedure  produced  substantial 

...  ( 
learning  potential  and  higher  levels  of  cognitive 

mod  if iabil i ty.     More  traditional  measures  apparently 

reflected  cultural  differences  and  not  difference.s  in  ability 

to  learn  or  profit  from  instruction. 

ERIC       ^    ,  :  ■ 
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Although  the  LPAD  and  the  IE  progr am  yeppear  somewhat 
promising,   there  is  still  little  evidenc'^to  suggest  that 
these  measures  are  any  less  biased  than  traditional  measures. 
Their  merit  presumably  lies  in  teaching  during  the  actual, 
testing  session,  but  it  is  not  ^known  if  the  cognitive 
strategies  that  are  trained  have  any  relation  to  the  child's 
perfotmance  in  the  classroom  setting. 

DiaqDOStic/CliDical^Teac^ipg 

An  area  that  bears  similarity  to  the  learning  potential 
strategies  is  called  "d i ag no st.ic  ^1  in ica  1  teaching" 
(Kratochwill  ,   1977;   Lerner,,   1976).     These  strategies  differ 
from  "d iagnotic-prescr ipti ve  teaching"    (Ys^eldyke,  1973? 
Ysseldyke  &   Salvia,   1974  ;   Salvias Ysseldyke,   1978)  which 
have  been  affiliated  wi th|  test-based  apti tute- treatment 
interaction  (ATI)   paradigms  (e.g.  Cronbach  &  Snow,  1976; 
Levin;   1977).   ^ Diagnostic -teaching  actually  embraces  a  number 
^^of  different  strategies  which  are,  at  present,  not  guided  by 
any  particular  theoretical  area.     Typically,  diagnostic 

r  .  ■ 

•  teaching   involves  the  actual   teaching  of  curriculum-related 
mater  ia  lender  conditions  that  max  imize  learning  (e.g., 
St  imul  us  mater  ial  s  ,  med  iai:  ional  stra  teg  ies,  reinforcement, 
feedback).     Thus,  their  relevance  in  the  nondiscriminatory 
asse'i^^ent  area  is  that  they  focus  on  tasks  nearly^  all 
children  ex  per  ience  in  -the  .school- curriculum  and^_th_^^ 
on  direct,  intervention  for  successful  curriculum  mastery. 
For  example,.  Myers  and  Hamniill    (-1969,   1973)  recommended 
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teaching  words  to  children  under  conditions  that  maximize  , 
learning  and  suggested  that  learning  disabled  children  should 
be  evaluated  on  learning  tasks  from  which  no*ms  are 
establ i  shed • . 

Likewi'se,  Hutson  and  Niles  (1974)   proposed  trial  ^ 
teaching  as  a  supplement  , to  traditional  testing.  Severson 
(1971f   19731  suggested  a  process  learning  assessment  strategy 
teased  on  ±eaching  acad^ic  content  under  different 
conditions.     In  research  tasks  employing  from  four  to  eight 
words  to  be  learned.^  pred-ictive  validi".y  relations  have 
ranged  from   .30  to   .73  with  achievement  test  criteria  (see 
Kratochwill  &   Severson,   1977;  Sewell  &  Severson,  1974). 

More  recently  Sewell   (1979)  'examined  the  predictive 
effectiveness  of  in tel 1 igence  tests  and  learning  tasks  for 
firkt  grade  black  and  white  children.     The  study  Reused  on 
the  relative  meri  ts  of  lea>;rni^g  ,  tasks  in  con-trast  to 
traditional   IQ  tests  in  predicting  academic  achievement.  The 
learning  tasks  invol\^ed  diagnostic  teaching,  paired  associate 

a  ■ 

learning and  a  learning  potential  assessment  using  the 
Raven's  Coloured  Progressive  Matrices  in  a  pretest-coaching 
posttesting  format.     The  diagnost^ic  teaching,  condi  tion 
involved  teaching  the  children  is" words  under  three  different 
conditions  that  proceeded  from  feedback  to  social  praisq  to 
tangible  reinf Srcement .     The  Stanf ord-Binet  served  as  the 
m^easure  of  intelligence  and  .  the  California  Achievement  Test 
as.the  criterion.     The  results  indicated  ^t|hat  the*  IQ  measure 
correlated  moderately  wi  th  achievement  v/i  th  bo th  groups  ,  the 
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IQ  was  a  more  reliable  predictor  for  the  white  children  than 
for  the  blacks,  and  for  the  black  chdldren,  certain  learning 
tasks  were"  better  predictors  than  IQ.   The  study  does  show 
that  although  the  IQ  was  a  significant  predictor  for  both 
groups,  both  this  measure  and  the  learning  tasks  were  a 
better  predictor  of  achievement  for  the  middle-class 
children .  *!  ^ 

While  these  procedures  represent  a  promising  area  within 
nondiscriminatory  assessment,  a.par^city  of  research  and  a 
limited  range  of  content  remain  limitations   (cf.  Kratochwill, 
1977) The  diagnostic  teaching  procedures  are  also  said  to 
.represent  information  useful   for  preset iptions  (e.g.r  Sewell, 
1979,   1981).     Unfortunately,   it  is  not  at  all  clear  how 
"specific  educational  programs  are  to  be  developed  from  the 
diagnostic  teaching  procedure.     Presumably,   information  on 
how  the  child  learns  academic  mater ial  unde'r  vatipus  „. 
conditions  of  reinforcement  can, be  obtained.   However  to  date, 
the  research  on  diagnostic  teaching  has  involved  ^rather 
limited  set  of  dimensions  that  are  known  to  influence  the 
learning  process.     It  is  doubtful   that  assessment  under  the 
usual  conditions  of  diagnostic  teaching  will  generalize  to 
the  child's  learning  in  other  .  sett ings  such  as  the  classroom . 

Cbild>DeveIopmeDt^QbseryatiQQ 

Within  a  tradition  similar  to  the  learning  potential 
assessment  and  diagnostic  teaching  is  the  "Child  Dev§aopment 
Observation"    (CDO)  designed  by  Ozer  anc3  his  associates  (Ozer^ 
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1966,  1968,   1978;  Ozer  &  Dworkin,   1974;  Ozer  &  Richardson, 
1972,   1974).     A  major  objective  of  CDO  is  to  simulate  the  . 
iproce^s  of  learning  on  protocols  that  saiji^ple  conditions  under 
\^ich  a^Jgiven  child's  learning  problems  may- be  solved, 
Diff^erent  teaching  strateg  ies  "are  also  enacted-to  see  how  the 
child  best  learns.  ' 

The  CDO  procedure  may  be  useful   in  nbnb\&sed  assessment 
in  that  it  does  not  conform  to  tradi  tional/ testing  paradigms'; 
no  score  is  derived  in  relation  to  a  norm  group;  decisions  do 
not  promote  diagnostic  labeling;  and  relating  assessment  data 
to  classroom  function^ing  is  intrinsic  to  evaluation   (Ozer'  & 
Richardson,    1972).     However,  there  are  no  data  on  the 
reliability  and  validity  of  the  procedure;  verbal  skills  are 
heavily  emphasi  zed   in  cesrtain  areas  of  assessment ,  and  the. 
CDO  does  not  systematically  sample  from  classroom  tasks  (cf. 
Kratochwill,   1977).     Like  the  other  process  oriented  measures 
of  learning  potential  and  diagnostic  teaching,  the  CDO  may 
not  reflect  the  conditions  under  wViich  learning  usually 
occurs  in  the  child's  usual  educational  environment. 


CI  anacal^MeuropsycboIogical>.AsseissitieDt 

The  field  of  neuropsychology  is  concerned  with 
delineating  brain-behavior  relations.  Neuropsychology 

includes  a  number  of  d i fferent ,  sometimes  only  remotely 

»  . 

related,  d isci pi ines  of  which  clinical  neuropsychology  is  but 
one.     Clinical  neuropsychology  focuses  on  developing 
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knowledge  about^humah  brai n-behav iop  r^la t ions , .  or 
delineating  the  psycholog ical  correlates  of  brain  lesidns 
(Davisdn,   1974;   Reitan,  1966).     Intellectual,  sensory-motor, 
and  personality  deficits  are  measured  and  related  to  brain 
lesions  or  to  jDrain  damage  in  the  broader  sense  of 
physiological   impairment.     The  work  in  this  are^  is  rooted  in 
academic  psychology,  behavioral  neurology,  and  particularly 
in  the  psychometric  field  in  psychology. ^  ^ 

Within  clinical  neuropsychology  there  is  a  dependence 
upon  standardized  behaviotal  observations  emphasizing 
normative  psychological  assessment  devices.     Within  this 
context,  behavior  is  defined  operationally'  and ,  usually , 
quantified  along  continuous  distributions   (Davison,  1974). 
The  clinical  neuropsychologist  is  typically  not  merely 

4 

concerned  with  distinguishing  brain  damage  from  other 
conditions.     Rather,  the  interest  lies  in  refining 
descriptions  of  clinfcal  conditions  including  inferences 
relative  to  location  and  extentVof  brain  damage.,  as  well  as 
probable  medical  and  psychological  conditions  accounting  for 

\  ^-   ^  .   

-t  he'""abTio  rm  a  1  betiav-roir;  . 

A  considerable  amount  of  information  has  been  obtained 
during  the  pa'st  decade  about  the  behavioral  characteristics 
of  brain-damaqed  persons  as  a  result  of  neuropsychological  . 
study  in  the  areas  of  mental  retardation,  learning 
disabilities,  behavioral  d i sabil i ties  ,  and  Convulsive 
disorders  (cf.   Reitan  &  Davi son ,   1974)  .   Irt  addition,  studies 
have  been  conducted  on  indiv^iduals  with  confi'rmed  .cerebral 
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lesions  independently  of  whether  these  individuals  manifest 
learning  or  behavioral  problems  (Reitan,   1974).  Finally, 
neuropsychological  studies  of  normal  children  have  been 
undertaken   (e.g.,  Kimura,  19(57). 

Further  re^^^arch  in  neuropsycholog  ical  assessment 
techniques  could  lead  to  some  interesting  applications 
relative  to  identification,  classification,  and  intervention 
strategies.     The  concept  of  brain  dysfunction  as  a  primary 
factor  in  learning  disabilities   (LD) ,   for  example,  has* 
received  increasing  attention  over  the  past  20  years.  By 
chai;adteri  zing  all  children  hav'ing  learning  disabilities  as 
having  minimal  brain  dysfunction,  many  professionals  seem  to 
have  attributed  LD  to  neurogenic  factors .  .  .However ,  much  of 
the  research  relevant  to  this  hypothesized  relationship^ is 
clouded  by  the  problem  that  LD  children  do  not  const itute  a  * 
homogeneous  group  (see,  for  example,  Hallahan  &  Kauffman, 

1978)  .  ^'  V  ' 

Typically,  ihe  term  "learning  disabilities"  has  been 
used  to  refer  to  children  who  show  a  discrepancy  between 
current  levels  of  school  performance  atld  measures  of  academic 
poteoJtial  wlldch  is  not  due  to  mental  retardation ,  cultural  , 
sensory,  oryeducational   inadequacies ,  or  serious  behav  ioral 
disturbances  (cf:  Bateman  &  Schiefelbusch ,  1969).  This  type 
of  general  definition  lacks  isuff icient • objective  criteria ,  so 
that  children  who  have  specific  disabilities  in  reading, 
spelling,  aritlimetic  or  multiple  deficits  are  all  categrozied 
as  LD  children.     Moreover,  each  type  has  often  been  referred 
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to  usinq  the  term  "minimal  brain  dysfunction."     The  lack  of* 

K 

precision  with  which  professionals  have  used  the  terms 


minimal  brain  dysfunction  and  learning  dis^Llity  may 
partially  account  for  the  inconsistencfies  found  in 
idetiti  f  ication  and  placement  practices. 

Recent  studies  in  neuropsycholog ical  assessment 
techniques"/  such  as  one  conducted  by  Ahn*  (1977)  ^  offer  the 
promising  pos;sibility  for  development  of  a  multiple 

iscriminate  function  utilizing  relevant  information  for  more 
preci se  classi f ication  of  large  groups  of  learning,  disabled 
'  children.     Ahn   (1977)   ^und  significant  pattern^  of  " 
.  difference  between  ^€hree  different  groups  of  presumably 
learning  disabled  children     (i.e.^  verbal  underachievers ^ 
arithmetic  underachievers  ^  and  mixed  u;nderachiever s)  and 
normal  children  in  quantitative  electrophysiological  measures 

iii  .e  . ,  electroencephalographic  evoked  potential  s)^^ — ^  

Results  such  as  these  lend  plausibility  to  the 
contention  that  neuropsycholog ical  assessment  techniques  may 
prove  useful   for  more  accurate  identification  and 
classification  of  children  possessing  different  specif ic 
learning  disabilities.     At  the  very  leasts  further  research 
in  this  area  should  increase  ^educators"   apd  psychologists; 
knowledge  of  the  many  different  types  of  problems  referred  to 
under  the  general  label  of  "learning  disabilities." 

Davison  (1974)  has  discussed  the  potential  utility  of 
clinical  neuropsychological  assessment  techniques  relative  to 
intervention.     Of  particular  import  here  is  the  fact  that  the 
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same  behavioral  deficits  may  be  due  to  differing  causal 
factors  and  r  therefore,  require  very  different  interventions.. 
A  readjinq  problem  for  example,  may  be  due  to  an. abnormal 
learning  history  or  to  a  structural  abnormality  of  the  brain. 
Thusr   for  remedial  purposes,  the    etiology  of  a  particular 
deficit  nnay  take  on  importance.  One  problem  with  traditional 
methods  of  psychodiaghostic  assessment  is  that  they  are  not 
typically  able  to  differentiate  among  the  many  possible 
^etiological  factors  involved  in  a  particular  disability. 

One  cannot  accurately  predict  the  outcomes  of  further 
investigation^  into  this  area  as  yet.     Increasing  our 
understanding  of  brain-behavior  relationships; will  require 
extensive  study  of  the;  behavior  of  humans  with  brain  damage 
Of  varying  locationr  extentr  etiology,  etc.     It  may  be  that 
the  product  will  be  merely  some  interest ing  <3escriptive 
s  t  a  t  i  s  t  i  cs .     Un  d  o  ub  ted  1  yy  ho  we  v'er^—  i  n  c  r  ea  sed  ~kn:o  wl  ed  ge~~o  f  ^ 
brain  betiavior  cor  relates  holds  potential   implications  for 
nondiscriminatory  assessment  techniques  as  well  as  decisions 
based  on  assessment  data.     Much  additional  work  needs  to  be 
cofidjcted  with  children^  making  these  procedures  more 
applicable  to  the  area  of  nonbiased  assessment. 

6 

Behavioral  -Asscs  sroept -.Strategies 

We  have  already  prov ided  a  relatively  thorough 
discussion  of  the  differences  between  behavior  assessment  and 
traditional  assessment  (see^Chapter  3)  y    Never theless r  the 
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reader  as  referred  to  some  excellent  overviews  of  this  issue 
(e.g.,  Ciminero,   1977;   Goldfried,   1976;   Goldfried  &  Kent, 
1972;  Goldfried  &  Linehan,   1977;  Goldfried  &   Sprafkin,  1974; 
Mash  fit   Terdall,   1981;   Kanfer  &   Saslow,   1969;  Mischel,  1968). 
Behavioral  approaches  emphasize  a  careful  examination  of 
environmental  antecedents  and  consequents,  as  related  to  a 
specific  response  repertoire.     Essentially,  such  an  analysis 
is  based  on  the  operant  tradition  (Bijou  &  Grimm,   1975;  Bijou 
&  Peterson,   1971;  Browning  &  Stover,   1971;  Gelfand  & 
Hartmann,   1975).     However ,  as   Evans  and  Nelson   (1977)  have 
observed,     the  strict  operant  approach  can  be  unduly 
restrictive.     A  more  global  approach,  given  the  current 
practices  in  psychology  and  education,  is  to  outline  how  a 
functional  analysis  approach  can  utilize  more  traditional 
psychometric  practices,  a  rapprochement  between  traditional 

 ps ychometr i cs  and  soei al  learn ing- theory  cal led.  "soc ial 

behavioral  osychometr ics"  by  Staats  (1975).     Moreover,  a 
cognitive  functional  approaqh  as  outlined  by .Meichenbaum 
(1977)   seems  especially  useful   in  seme  areas  of  psychological 
assessment.   In  compariTig  this  approach"  to  a  more  conventional 
•functional  analysis,  Meichenbaum   (1977)  suggesfes: 

A  cognitive-functional  approach  to  psychological 
deficits  is  in  the  same  tradition  but  includes  and 
emphasizes  the  role  of  the  cl ient ' s  cogni t ions 
(i.e.,   self  stutements  and  images)    in  the 
J  behavioral  repertoire.     In  short,  a  functional 

analysis  of  the  cliient's  thinking  processes  and  a 

erIc  ■  ■       ■  •   
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careful  inventory  of  his  cognitive  strategies  a 
conducted  in  order  to  determine  which  cognitions 
(or  the  failure  to  produce  which  key  cognitions)  , 
under  what  circumstances,  are  contributing  to  or 
•interfering-  wi  th  adequate  performance  (p,  236), 
In  a  later  section  of  this  chapter  the  cognitive  functional 
analysis  strategy  will  be  described  in  more  detail.  This 
strategy  as  well  as  others  should  be  conceptualized  as 
expand  ing  the  asses  sm en t  base  of  psTychoed  uca  t  ional  behav  ior 
assessment,  ^ 

Conceptual  -.Eramework-,f  or -.Behavior  a  1  .^Assessroeot 
With  increasing  diversity  in-  behavioral  assessment,  a 
conceptual  framework  f or- cl assi f yi ng  behavioral  measures  is 
helpful  to  organize  methods  and  v;hat  they  are  designed  to 

a s sess.     Cone  ( 1977,   1 97 8 )   and  Co ne  a nd  Ha wki n s  (1977) 

developed  a  conceptual  framework  and  a  taxonomy  called  the 
Behavioral  Assessment  Grid    (BAG),     It  is  based  on  the 
simultaneous  consideration  of  three  aspects  of  the  behavioral 
assessment  process:   (a)   the  contents  assessed,   (b)  the 
methods  used  to  assess  them,  and   ( c)    the  types  of 
generalizabiliVy  (i.e.,  reliability  and  validity)  established 
^for  the  measure  being  employed.     The' relations  among  these 
three  aspects  of  assessment  are  presented . in  Figure  7.3. 

V  ... 
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BEHAVIORAL  ASSESSMENT  GRID 


Figure  7.3  -  The  Behavioral  Assessment. Grid  (BAG),  a  taxonomy  of  behavioral 
•assessment  J  ntegrating.contents.K.m^^^  — r 

generalization  (Source:  Cone>  J.D.,  The  Behavioral  Assessment 
Grid  (BAG):  A  conceptual  framework  aiid  a  taxonomy.  Behavior 
Therapy.  1978,  9,  882-888.  Copyright  1978  by  Association  for 
,  Advancement  of  behavior  Therapy..  Reproduced. by  permission). ' 
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Contents ;     Dehav ioral-  assessment  is  commonly  conceptual- 
ized in  three  content,  areas  (Cone,   1977;- Cone  &  Hawkins, 
1977) r  systems  (Lang,   1968,   1979,    1977)   or  channels  (Paul  &^ 
Bernstein,   1973) .  The  contents  are  most  commonly  referred  to  as 
motor,  physiological,  and  cognitiye  (see  Morris  &  KratOchwill, 
1983)  .     Although  there  are  some  basic  disagreements  as  to  what  is 
specifically  included  within  these  categories,  we  will  present  the 

scheme  developed  by  Cone   (1978)   because  it  serves  a  heuristic 

« 

function. 

Motor -^content  is  one,  of  the  most  frequently  used  content 
areas  and   includes  activities  of  the  striate  musculature  typically 
observable  without  special  instrumentation.  Included  in  this 
content  area  would  be  such  activities  as  walking,  running, 
jumping,   talking  and  other  motor  components. 

P.bysaoloqical^contents ,  Recording  to  Lang    (1971),  include 
activities  of  muscles  and  glands  autonomically  innervated  and 
tonic  muscle  activity.     Some  examples  of  physiological  content.are 
TTruBcl"e"~ten^si-onv"h'e-art~"ra~te"r^^^ 


response.   Such  measures  are  usually  assessed  through  special 
i  n  s  t  r  un  e  n  t  a  t  i  o  n . 

Cogni tive-copteots  are  defined  in  the  context  pf  the  * 
particular  referents  used.     Thus,  while  verbal  beha\^ior 
( sel  f- report)    can  be  categorized  as  motoric  when  one  is  referr^'ng 
to  the  speech  act  (see  above),  the  referents  may  be^,niotor, 
cognitive,  or  physiplog ical .     When  verbal  behav iof^ref ers  ^o 
private  events  (e.g.>  feelings,  thouglits)   the  referents  are 

erIc  ^  M  ■ 
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cognitive,  but  when  it  refers  to  a  publicaUy  verifiable  behavior, 
the  referent  is  either  physioloq  ical  or  motor.     When  concJuctinq 
behavioral  assoesment,  the  assessor  should  bo  concerned  about  ,the 
tolations  among  the  three  extent  areas.     This  means  that  in  a 
particular  situation  an  individual  may  respond  coqnitiyely; 
motorically,  and/or  physiologically.     Some  evidence  suggests  thab 
the  three  content  areas  are  not  necessarily  highly  intercorrelated 
(see  Belladk  &   Hersen,   1977a,   1977b;  Corfe,   1976a;   Hersen  &• 
Bellack,   1978),  but  reaspns  for  this  remain  sonfewhat  .unclear 
(Hugdahl,   1981;   Kozack  &  Miller,   1982).-    Part  of  the  difficulty  in 
research  invest iq at ihg  the  relations  among  the  three  systems  may 
be  related  to  methodological  problems.     For  example.  Cone  and 
Hawkins  (1977)   argued  that  comparisons  of  the  three  systems  have 
confounded  method  of  assessment  with  behavioral  content.  This 
problem  occurs  when  self-report  measures  of  cognitive  activities 
are  compared  to  direct  observation  measures  of  behaviot.     A  child 
may  be  trembling  but  may  report  that  he/shp  is  not  .f  r  ight'ened . 
This  could  be  assessed  through  self-report  measures  and  direct  a 
observation,  but  a  low  correlation  between  content  areas  may  be 
due  to  content  differences  or  method  differences,  or  both. 

A  second  problem  in  this  area  is  related  to  definitions  of 
the  three  response  systems   (Hugdahl,   1981;   Kozack  £.  Miller,  1982). 
Such  individuals  as  Lang  and  Paul  and  Bernstein  have  based  their 
definitions  on  hypothetical  constructs.     For  example,  when  Lang 
(1968,    1971)   discusses  the  three  response_3i:.S*^m^  in  the  context 
-    of  measuring  fear,  the  response  is  presumed  ^tio  underly  a-varietjr^, 
of , behaviors  such  as  escape  and  avoidance.     In  contrast  to  this 

Er|c    •  .   tv  32R 
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view,   Cone  and  fii^  associates  (Coner   1975,   1976b,   l'^78;  Cone  &  , 
Hawkins,   1977)   prefer  the  conceptualization  in  which  each  content 
area   is  examined  within  the  context  of  stimulus  and  consequent 
variables  present  in  any  given  situation.     This  latter  strategy 
seems  most  useful   in  advancing  work  in  this  area,  although  this 

"still  remains  debatable. 

Methods .     Different  methods  are  used  to  gather  data  across 
each  of  the  three  content  areas.     Cone   (1977,   1978)   ordered  these  ^ 
assessment  methods  along  a  continuum  of  directness  representing 
the  extent  to  which  they  (a)  measure  the  target  behavior  of  , 
clinical  relevance  and   (b)   measure  the  target  behavior  at  the  time 
and  'plape  of  ^its  natural  occurrence. 

The  methods  are  categorized  into  direct  and  indirect 
dimensions.     Interviews  and  self-reports  are  at  the  indirect  end 
of  the  continuum  because  the  behavior  is  considered  a  verbal 
representation  of  more  clinically  relevant  activities  taking  place 
at  some  other  time  and  pla6e.     Moreover,  ratings  by  others  are 
included   in  the  indirect  category  because  they  typi»|ally  involve 
retrospective  descriptions  of  behavior.     In  contrast  to  direct 
observation,  a  rating  of  a  behavior  occurs  subsequent  to  the 
actual  occurrence  of  the  behavior.  ^  ,  . 

Included  within  the  direct  assessment  methods  are  self 

.  moni tor i ng  ,  analog :r ole  play,  analog:free  behavior,  - 

nar:rralistic  rrole  play,  and  naturalistic  :f  ree  behavior  . — These  

dimensions  are  organized  according  to  who  does  the  observing,  the 
instructions  given,  the  observer,   and  where  the  observations 
'occur.     In  sel  f-mpni  tor  ing  the  .observer  and  the  observee  are  the 

ERIC        ,  ^    :       Mb   ' 
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same  individual.     Self-monitoring  differs  from  self-report  in  that 
atn  observation  of  the  behavior  occurs  at  the  time  of  its  natural 
occurrence.     Analogue  assessment  refers  to  settings  or  situations 
that  are  analogous  to,  but  not  the  same  as  \?he  natural 
environment.     In  this  type  of  assessment,   the  client  may  be 
instructed  to  role  play  a  particular  behavior  or, act  normally,  as 
if  he/she  were  in  the-  natural  environment.     Technically analogue 
assessmeil^ts^an  vary  along  a  number  of  dimensions   (Kazdin,  1980). 
Finally,  assessment  may  be  scheduled   in  the  natural  environment 
under  either  role  play  or  completely  natural istic  conditions. 
Each  of  the  eight  assessment  methods  are  discussed   in  more  detail 

in  this  chapter.  -  ,  ^ 

I 

Uni  verses -.of  ^generalizatioD  .     The  var  ious  measures  are  also 
indexed   in  terms  of  the  different  ways  in  which  scores  can  be 
generalized  across  six  major  universes:   (1)   scorer,   (2)    item,  (3) 
time,    (4)    setting,    (5)  method,  and    (6)   dimension  (see  Figure  7.3). 


ty  theory  as 

;  Cronbach,  Gleser, 


ERLC 


The  basis  for  this  framework  is  generali zabil i 
discussed  by  several  authors  (Cone,   1977,  1978 
Nanda,  &   Rajoratnam,   1972>  Jones^   Reid,^&  Patterson,  1975; 
Wiggins,   1973).     Scorer  general i ty  refer s  to  the  extent  to  which 
data  obtained  by  assessor   (or  scorer)    are  comparable  to  the  mean 
of  the  observations  of  all   scorers  that  mighl:  have  been  observing 
the  behavior.     Essentially,  this  concern  is  one  of  the  agreement: 

betv;een  assessors  on  observations  of  -  behavior  .  JWhen_„two  

individuals  agree,  scores  are  said  to  generalize  across  the 
scorers . 

IteiTT  generalization  refers  to  the  extent  to  which  a  given 
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response  is  representative  of  those  of  a  .larger  universe  of 
similar  responses*  In  behavioral  assessment  item  generalization 
could  be  used  in  self-report  instruments  as  when  , scores  on 
odd-numbered  items  parallel  those  of  even- numbered  ones.^ 
Moreover,   in  behavioral  observation  odd-reven  scores  might  be 
compared  during  various  phases  of  baseline  and  treatment 
asses sment  • 

Generalization  across  time  refers  to  the  extent  to  which  data 
collected  at  one  point  in  time  are  representative  of  data  that 
might  have  been  collected  at  other  times*     Generally,  behavioral 
assessors  are  concerned  with  the  consistency  of  behavior  across 
time,  particularly  within  the  context  of  stability  in  an 
intervention  program.  '  , 

Setting  generality  refers  to  the'extent  to  which  data  <. 
obtained  in  one  situation  are^representat ive  of  those  obtainable 
in  others.     A  b'ehavioral  assessor  would  be  concerned  with  the 
degree  of  generality  of  a  behavior  across  settings;  such  as  from 
Classroom  A  to  Classroom  B.  .  . 

The  method  generality^  of  assessment  refers  to  the  extent  to 

which  data  from  different  methods  of  measuring  *a  target  behav ior 

produce  consistency.     Cone  (1977)   notes,  "the. method  universe  of 

generali^zation  deals  with  the  issue  of  the  comparability  of  data 

produced  from  two  or  more  ways  of  measuring  the  same  behavioral 

content"   (p.  ~420)  .  Here  behavioral  assessors  might  de  concerned 

with  the  general  cor respondence  between  measures  of  self-report 

*  .  ■ 

and  direct  .observation  of  behavior* 

Dimensions  generalization  refers  to  the  comparabi 1 i ty  of  data 
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on  ••wo  or  more  different  behaviors.     When  scores  on  a  particular 

.    1  ■ 

measure  of  one  behavioral  din^ensibn  relate  to  scores  on  other 
variables  for  the  same  clients,  the  scores  are  said  to  belong 
common  universe. 

-  Behavioral -tA.ssessroent^ Methods 

-■  ■  ■    ■  ■  '  ^     "  ■         "    .   '        "    '  . 

Our  discussion  will  now  focus  on  the  most  common  methods  of 
behavioral  assessment,   including   (a)   behavioral  interviews,  (b) 
self-report,   ( c)    problem  checklists  and  rating  scales,  :(d) 
analogue  measure's,  and   (e)   direct  observation  procedurs.  In 
addition  to  this  list,  psychophysiological  procedures r 
criterion-referenced  testing,,  and  more  traditional 
psychoeducational  testing  are  discussed  within  the  context  of 
their  use  in  behavioral  assessment.  ^ 

\Vhile  any  one  of  these  procedures  might  be  used  to  assess  a 
child's  learning  problems,  this  would  be .a  rare  instance .  More 

..likely,   it  is  some  col(\bination  of  procedures  and  devices  that 
provide^an  adequate  data  base- for  intervention.     Moreover,   it  is 
the  novel  application  of  psychoeducational  assessment  procedures  . 
rather  than  their  routine  application  that  will  provide  useful. 

•   information  for  educatio.naL  programming .     In  this  regard,  it  is 
the  routine  and,  stereotyped  battery  of  assessment  profc^^^  that 
will  likely  lead  to  erroneous  conclusions  about  in tervent ion . 
\         Lnterview^/lssessmeot .   The  interview  method  of  gathering  dat 
is  perhaps  one  of  most  common  iT\ethods  us^ed  in  behavioral 
assessment.     The  interview  assessment  method  has  also  been  used 
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widely  in  traditional  psychotherapy  and  education  (e.g.r  Benjamin^ 
1974;   Bingham.  Morre ,  &  Gustad,   1959;   Fear,   1973;  Grant  &  Bray, 
1969;    Kahn  &  Cahnell,   1957;  Matarasso  &  Wiens,   1972;  MCGormick  & 
Tiffin,   1974;   Sullivan/  1954;   Ulrich  &   Trumbo,   1965).  Behavior 
assessors  have  also  regarded  the  interview  as  an  important 
clinical  assessment  technique  (e  .g . ,  Ciminerb,   1977  ;  Ciminero  & 
Drabman,   1977;  Mash  &  Terdall>  1981;  Meyer,  Liddel ,  &  Lyons,  1977; 
Linehan,   1977;  Marholin  &  Bijou,   1978;  Morganstern,  1976). 
However,  even  with  interest  in  this  area,,  concerns  have  been 
raised  over  the  reliability  and  validity  of  the  technique. 
Ciminero  and  Drabman   (1978)   noted  "...the  data  available  at  this 
time  suggest  that  we  fnust  be  very  cautious ,  if  not  skepirrcal,  of 
interview  data  for  children  and  parents"    (p.  56). 

This  conclusion  appears  wai^ranted,  especially  in  , light  of  the 
paucity  of  research  on  behcivibral   interviewing  and  the  inf ormal 
str  a  teg  ies^by^wblcb^behavi  or al^iDterviews-.are^comroQDly^cQQducted 

(Kr atochwlll ,   1982).     The  lack  of  research  on  interv iewi ng  is 
generally  well  known, (cf.   Bergan,   1977;  Ciminero  &  Drabman,  ^1978; 
Linehan,   1977).     Also,  while  some  systems  present  a  conceptual 
fram'ework  for  the  -  behav  ior  al   interview  (e  .g  . , '  Kanf  er  &  Grimm, 
1977;   Holland,   1970;  Kanfer  &  Sas^ow,   1969),  few  formal  script  . 
guidelines  are  provided  ^and  the  assessor  u^ally  does  not  have  a 
format  for  what  specific  questions  should  be  asked  at  what  point. 

The  compilation  of  data  during  the  in terv iew  sho uld  yield  a 
good  basis  for  deci sions  about  the  areas  in  which  intervention  is 
needed,   the  particijlar  targets  for  further  assessment ,  some 
tentative  targets  tor  intervention,  methods,  and  goals. 
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The  interview  can  provide  one  of  the  first,  contact  points  for. 
providing  descriptions  of  learning  and  behavior  problems, 
identification  of  specific  behaviors  needing  modi f ica tion ,  as  well 
as  variables  ( an tec^dent  and  consequences)    controlling  learning  . 
and  social  behav iors  .X^^^ma jor  contr  ibut ion  of  what  should  be  , 
covered  in  a  behavioral  int^rvj^w  was  presetited  by  Kanfer  and 
Saslow   (1969).     The  authors  noted  that  their  guidel ines  can  , 
provide  not  only  the  initial  inforiThation  collected  from  the 
client,  but  also  data  relevant  to  formation  of  a  treatment  plan 
(see  also  Meyer,  et  al.,   1977,   for  a  similar  proposal)  •  [An 
outline  of  the  approach  presented  by  Kanfer  and  Saslow  (19^9,  pp, 
43C^-^43'7)    includes  the  general  components  that  are  useful  for 
assessment  of  learning  and  behav ior  problems . )^ 

More  recently,   Kanfer  and  Grimm   (1977)   proposed  a 

...  ,         ^  _  ■ 

differentiation  of  controlling  variables  and  behavioral 
deficiencies  into  categories  that  can  be  matched  with  various 
intervention  strategies.     Their  f ive  categories  include:  (1) 
behavior  def iciences ,   ( 2 )   behavioral  excesses,   (3)  inappropriate 
.environmental  control,    (4)    inappropriate  self-generated  stimulus 
control,  and   (5)   problematic  reinforcement  contingencies.  The 
authors  further  indicate  for  each  category:,  (a)   briefly  which  kind 
of  statements  serve  to- def ine  a  particular  behavioral  probfem  as  a 
member  of  each  category,   ( b)   examples  of^ 

"target  behaviors  in  a  clas-9,-  and   (c)    briefly  what  therapeutic 
variables  are  available  for  chanqe.     Like  many  conceptual  systems 
for  interviewing,  s^cific  strategies  for  intervention  can  be 
found  in  the  applied  literature  (e .g . ,   Bandura ,   1969;  Goldfried  & 
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Davidsdn,  1976;  Kanfer  &  Phillips,  1970;  Kanfer,  &  Goldfried,  1975; 
Mahoney,   1974;   Sulzer-Azaroff,  &  Mayer,   1977;   Thoresen  &  Mahoney, 
1974;   Rimm  &  Masters,  1974).     ,  . 

Only  one  behavioral  system  has  been  developed'  for 

•   '  ■  ■> 

comprehensive  interviewing  of  clients  and  consul  tees  (care 

providers)  ,  namely,   the  Behavioral  Consultation  Model  developed  by 
Bergan  and  his  associates  (cf.  Bergan,   1977).     The  Behavioral 
Consultation  Model   (cf.   Bergan,   1978;  Bergan  &  Tombari,   1975,  ^^ 
1976;  Kratochwill  &  Bergan,   1978a,   1978b)    provides  a  format  to 
formalize  the  verbal  interactions  occurring  during  behavioral 
interviewing.     The  problem-solving  model  developed  by  Bergan  and. 
associates  is  designed  to  assist  teachers  and  parents  to  define 
various  problems  (e.g.,  academic  and  emotional),  to  foriflulate  and' 
implement  plans  to  solve  problems  (i  .e.  ,  behavior  intervention 
programs)  ,  and  to  evaluate  various  treatment  goals  (target  of  the  ♦ 
interventijpns)   and  the  effectiveness  of  educat,ional  programs. 

The  consultation  interview  format  is- actually  a  conceptual 
system  for  solving  a  variety  of  problems  through  an  interview 
methodology.     In  this  regard,  the  apptoach  is  particularly  useful 
-in  psychoeducational  assessment  of  learning  and  behavior  problems. 
Consultive  problem  solving  may  focus  on  the  achievement  of 
long-range  developmental  goals,  or  it  may  center  on  specific 

concerns  of  imniediate-^Jmpor feanGe  ^to  -the-Gh^^-i^^  :      ~  7- 

■  ■      "»  •        ■  ' .       -  . 

consiiltee.     Developpneotal-.coosultataoD  focuses  on  behavior  change 
that  typically  requires  a  relatively  long  periodof  time  to 
attain.     This  form  of  consultation  may  require  repeated  interviews 
and  the  focus  on  subgoals  which  are  subordinate  to  long-t^rm 
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objectives/    Thus,  repeated  applications  of  the 
problem-solving  process  woulu  be  necessary  until  all  the 
objectives  of  developmental  consultation  a^e  achieved.  This 
form  of  consultation  is  possibly  more  necessary  in  treating 
severe  learning  and  behavior  problems.     For  example,  a  child 
experiencing  severe  failure  in  reading   (three  or  four  years 
behind  grade  level)   could  possibly  involve  months  of 
extensive  intervention  within  this  model. 

On  the  other  hand'f  many  educational  and  psychological 
problems  presented  to  the  professional  call  for  intervention 
on  a  limited  number  of  specific  behaviors  of  immediate 
concern  to  the  teacher  or  parents.     Bergan   (1977)  desyibed 
consultation  problems  of  this  kind '  as  probleiOmceotexed    .  \^ 
consultation  .     For  example,  consider  a  relat ively"' ^eci f  ic 
problem  of  a  child  experiencinc(  a  high  frequency  of ''errors  of 
orientation  and  sequence  in  handwriting.     The  majority  of  the 
child's  Words  were  written  as  mirror  images  of  the  correct 
word.     During  the  plan   implOTentation rphase  of  the  interview 
secmence,  the  teacher  was  requested  to  say  "right^'  and  praise 
af  ter  eacfi  correct  response  ( i  .e .  r  writing  a  word  correctly) 
and  "  wr  d  n g  "  and  g  i  v  e  " c  o  r  r  e i  ^^^^  k"  af  t eT~~e a  ch— i  no p  r  r  e c  t~ 

response.     After  several  rdpeated  applications  of  this 
treatment,  the  child's  wriling  reversed  to  normal  patterns. 
Thus,  the  consultant's  task  was  completed  with  the  successful 
change  in  handwriting. 
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There  are  four  stages  in  the  consultation 
problem-solving  model:   Namely,  problem  identification, 
problem  analysis,  plan  implementation,  and  problem 
evaluation;     These  stages  (listed   In  Figure  7,4  )  describe 
th^  steps  necessary  to  move  from  an  initial  designation,  of 
the  problem  through  the  plan  development  and  implementation 
to  achieve  problem  solution,   to  the  evaluation  of  goal 
attainment  and  plan  effectiveness. 

Eroblem^idePtificatioR .     In  problem  ident if icat ion  the 
proble^m  or  problems  to  be  solved  are  specified,     A  prc^TerrT  is 
defined  in  the  context  of  a  (discrepancy  between  observed 
behavior  and  desired j behav ior   (Kaufman ,   1971) •     For  example, 
a  child  may  know  only  three  of  the  26  letters  of  the 
alphabet.     The  problem  is  to  devise teaching  strategy  so 
that  the  remaining  letters  will  be  acquired, 

1,       Problem  identification  is  achieved  primarily  by  means  of 
a  p^blem-ident^ji^f  ication  interview*  (PII)  .     In  the  imterview, 
the  consultant  assists  the  consul  tee  'to  describe  the  problem 
of  concern  to  him/her.     In  the  case  of  a  child  who  has  hot 
learned  his/her  letters,  the  consultant  might  say  "tell  me 

 ^  ^  :  ^     j  ^— /f:-r:L__l_______  i  — ■ 


what  Jack  does  when  you  present  him  with  a  letter  to  be 
iearned,"     The  question  is  del iberately  phrased  so  that' a 
socialization  agent  (e,g,,  teadher)   will  provide  a  rather  .  / 
specific  description  of  the  pr'p^Dlem  rather  than  a  global  Vone, 
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Problem  Identification 


Problem  Analysis 


Plan  Implementation 


jproblem  Evaluation 


Figure^  7.4  -  States  in  consultative  problem  solving  (Source: 
Bergan,  J.R. ,  Behavioral  Consultation,  Columbus, 
Ohio:    Charles  E.  Merrill,  1977.    Reproduced  by 
permission). 
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The  forms  of  the  problem  may  shift  throughout  the  PII, 
so  that  all  relevant  concerns  cire  identified.     Thereafter,  a 
discussion  of  baseline  mea-.^ures  '  which  will  provide  a  level 
from  which  to  evaluat*^  ti-^e  treatment  is  outlined.     In  our 
letter^ learning  il lustration,  a  discussion  of  how  to  measure 
learning  or  its  absence  would  be  desci^ibed   (e.g.,  number  of 
trialSf  responses, 'or  presentations) . 

2.  Rroblem^Analysis  .     After  a  "problem  ha'?  been  identified  , 
consultation  then  focuses  on  problem  analysis.     The  purposes 
of  problem  analysis  are  to  identify  variables  that  facilitate 
a  problem  solution  and  development  of  a  plan  to  solve  the 
problem  specified  in  the  problem- identi f icat ion  phase  of 
consu     -    '^n.     Problem  analysis  is  again  primarily 

accom pi  i shed"  through  a  problem-analysis  interview  (PAI)  .  In 
the  PAI  the  consultant  and  consul  tee  discuss  client  skills 
and  environmental   factors  that  might  be  con trol 1 i ng  client 
behaaf^ior.     For  example,   in  our  letter  learning  situation  the 
consultant  might  suggest  some  behavioral  principles  to  assist 
the  teacher  in  teaching  the  letters,  ylt  might  be  determined 
that  feedback  and  reinforcement  need  to  be  presented  in  a 
consistent  fashion,  or  a  discrimination  procedure  developed 
with  similar  letters.     Subsequen.tly,.  a  specific  plan  would  be 
developed  to  imprement  the^^  s^^  ^  plan 

Tmight  specify  the  conditions,  time,  place,  and  factors  that 
fac^ilitate  generalization,  and  so  forth.  ,  ■ 

3.  Rlao^LippleroePtation .     The  plan   implementation  phase  of 
consultation  is  designed  to  implement  and  .monitor  the  plan 
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designed  during  problem  analysis.     Data  collection  typically 
continues  so  that  the  consul  tee  will  have  some  indication  as 
to  the  effectiveness  of  the  plan.     The  only  interviev/s  that 
may  occur  during,  this  phase  are  those  used  to  check  briefly 
with  the  consul  tee  to  determine  that  there  is  agreement 
between  the  plan  specified  and  the  plan  implemented^  and  to 
deal  with  unforseen  implementation  problems.     For  example,  it 
might  be  discovered  that  the  parents  are  also  working  with  ^ 
the  child  in  a  fashion  that  serves  only  to  have  the  two 
instructional  strategies  working  in  opposition  to  each  other. 
The  consultant  must  then  deal  with  this  problem. 
4.      Rroblem^E3/aj|uation .     Problem  evaluation  takes  place 
through  a  formal  problem  evaluation  interview  (PEI)   and  is 
conducted  to  determine  if  problem  solution  has  been  achieved 
by  comparing  data  collected  during  plan  implementation  with 
the  level  of  acceptable  performance  specified  in  problem 
identification.     Moreover ,  consultation  may  be  terminated  if 
goals  have  been  met  (e.g.,   if  ^he  child  acquires  his 
letters).     However,  other  problems  may  be  introduced,  and 
consultation  may  take  on  the-  developmental  or ientation ^  For 
example,   it  might  be  determined  that  the  child  has  learning 
problem^  in  math  and  other  areas  of  reading.  Consultation 
may  then  move  back  to  problem^  analys  is  3nd~~th^e~phase  sequence 
continues . 

There  are  several  features  that  set  aside  the  ^ 
consultation  interview  system  from  other  interv ievf  procedures 
described  in  the  behav ior  therapy  literature.     First,  the 
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model  conceptual  i^jes  the  interview  within  the  context  of  a 
consultant-consul  tee  relationship.     Thus ^  it  is  implied  that 
iodirect  service  will  be  provided  to  a  client  through  some 
mediator  (see  also  Tharp  &  Wetz3lr   1969).     Second,  in 
contrast  to  the  S-O-R-K-C  sequence  presented  earlier  (cf. 
Kanfer  &  Phillips,   1970;   Kanfer  &   Saslow,   19-69),  the 
consultation  interview  system  places  less  emphasis  on  the  0, 
or  biolog ical. condition  of  the  organism.     However,   it  should 
Vbe  noted  that  such  factors  can  be  coded  in  the  Bergan^  ( 1 977 ) 
prb^edures.     Third,   the  consultation   interview  system 

primalfc^ly  designed  to  be  employed  for  academic  attd  social 

...  ...      ■   .     '  _ 

problemsy">whereas  the  o.ther  strategies  suggested  in  the 

1 iterature  \h&ye  been  primarily  aimed  at  behavioral/social 

\^         '  .      ^  ■ 

problems.     Fit)alJ.y,  and  perhaps  the  distinct ive  feature  of 

/he  behavioral  conVul tation  interview  system  is  that  specific 

and  detailed  coding  systems  have  been  developed  for  verbal 

interact  ions  occur  ing  dOiring  the  actual   interview  (cf.  Bergman 

&  Tombari,   1975).     Thus,  the  consultation-analysis  technique 

enables  the  prof essiona^^^to^assess  the  types  of 

verbali  zations  emitted  in  consultation  interviews  i-  -'Since 

this  is  an  im^portant  feature,   it  is  briefly  described  here. 

5.       Messaqe^Clarif ication.     The  classification  system' 


developed  by  Bergan  and  his  associates  (e.g.r  Becgan  & 
Tombari,   1975)    is  intended  to  articulate  to  the  four-stage 
problem-solving  model  described  above.     The  analy,sis  system 
classifies  verbal   interchange  in  terms  of  four  ca\egories: 
source,  content ,  process ,  and  control.     Table  7.5,  shows 
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these  four  categories  and  the  subcategories  associated  with 
them.     The  message  source  category  indicates  the  person 
Speaking.     Content^ refers  to  wha\  is  being  talked  about. 


\ 


Proccess  indicates  the  kind  of  verbal  action  conveyed  m  a 
message,  and  control  refers  to  the  potential  influence  of  a 
verbalization  by  one  participant  in  the  interview  on  what 
will  be  said  or  done  by  another  participant  ^(^ee  Bergan, 

1977,  pp.  30-46) .  \^ 

To  code  events  of  observation,  in  accordance ^wi th  the 
message  classification  categories,  the  behavior  consultant 
employs  a  consultation-analysis  record  form   (see  Fiqu\e  7.6) 


The  consultation-analysis  record. calls  for  coding  in  all  x|our 
message-classification  categories. for  each  event  of 
observation.     The  system  is  quite  complex  and  requires 
extensive  training   (Brown,   Kratochwill,  &  Bergan ,   1981)  .  It 


"Its'  a  us  etui  procedure  for  psychoeducational  assessment  and 
the  most  sophisticated  interview  procedure  available  to  date. 
Although  the  model  clearly  represents  a  comprehensive  \ 
assessment  system  within  behavioral  psychology,  it  also  links 
assessment  to  treatment.     Since  consultation  is  largely  a 
matter  of  verbal  interchange  bet ween  a  consultant  and- 
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•  Table  7,5 


TABLE  7  .  b  Messat^e  classificaiion  categories  and  subcategories 


Catt-'gorios 


Subcategories 


Message 
Source 


CnnsulMnt 
ConsullL'O 


NU'ssj^e 
Content 


Background 
Fnviroi-  utiI 

i  .t'hn\  lOr 
Selling 

Behavior 

Individ^  ^! 
Ctiaraclei  islics 

Observation 

IMan 

Other 


Message 
Froce*.s 


SpecMicaiion 

Evolualion 

Inference 
Summarization 

Validation 


Message, 
Control 


Ehcitor 
Emitter 


343 


Table  7.6 
Consultation  Analysis  Record  Form 
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CONSULT  ANT 
CONSULUL 


CASE  NUMBER 


INTERVIEW  TYPE 
PACE 


CONSULTATION-ANALYSIS  RECORD 


Mes 
Sou 

c 

rce 

z  ^ 

^ 

C 

> 

9 
> 

geC( 
P 

jnter 

c 

*  ' 

X 

It 

C 

Cl 

<^ 

0 

c 

r5 
> 

> 

2 

9 
1 

> 
UJ 

> 

Mess 

If 

'.J 

Li 

ag9. 
C 

9 
H 

;j 
c. 

PrQC< 

c 
c 

?ss 

c 
o 

> 

z 

Positive  Validation 

ivies 
Con 

.i  ■ 

UJ 

sage 
Irol 

c 

UJ 

< 

y 

 ; 

0 

(From  I.  R.  Bcrgan  &  M.  Ll  Tombari, 
The  analy-Ais  of  verbal  intorjctions  occurring  during  cot^^ultalion.  lournal  of 
School  Psychology,  7975,  13.  272.  Reprinted  by  permission  of  Human  Sciences 
Press,  72  Fifth  Avenue,  New  York,  New  York  10011^  Copyright  ^  1975  }  \  ' 
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consul  tee  and/or  client ^  emphasis  has  been  placed  on  the 
analysis  of  verbal  behavior.     Bergan   (1977)   suggests  that 
consultant  control  of  verbal  behavior  during  consultation 
necessitates  not  only  recognition  of  the  types  of  verbal 
utterances'  that  have  occurred  dur ing  in terv iews ,  but  also  the 
ability  to  produce  different  kinds  of  verbalizations  to  meet 
specific  interviewing  situations  and  problems.     If  a 
consultant  is  trying  to  elicit  information  about  conditions 
con.ti:ollinq  client  behavior,  he  or  she  must  be  able  to 
produce  the  type  of  verbal  utterance  most  appropriate  for  the 
particular  goal.  ,  , 

The  future  will  likely  see  increased  sophistication  and 
use  of  the  interview  assessment  stragety  for  purposes  of 
nonbiased  assessment  (Reschly,  X981).     Despite  the 
recognized  limitations  of  the  interview  procedure,  'several 
positive  agents  of  in terv iew' assessment  approaches  have  been 
identified    (Linehan,   1977,  pp.   33-34).     Fir st ,   interv iew 

assessment  is  a  flexible  means  of  obtaining  data  in  that  it 

...  * 

can  be  used  to  gather  both  general   information  coverning  many 
areas  of  the  chi Id ' s  functioning  and  detailed   information  in 
specific  probl  em  areas .     Second,  variationsi  in  the 
careprovider' s  nonverbal  ^  and  verbal  behavltnr'can  .  be  examined 
in  relation  to  the  assessor ' s  quest  ions  thereby  allowing  an' 
analysis  of  responding  and  lines  of  further  inquiry.  Third, 
the  interview  typically  promo tes  the  development  of  a 
personal   relationship  (in  contrast  to  such  methods  as  di 
observation  where  there  may  be  no  interaction  between  , 
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assessor  and  careprovider)  .     Fou:  th,  the  interview  may  allow 
for  potentially  greater  confidentiality  relative  to  some 
other  assessment  procedures  (e.g.,  rating  methods,  direct 
observation).     Fifth,  interview  assessment  may  be  an 
important  source  of  gathering  information  frpm  individuals 
who  are  unable  to  provide  information  through  other  means 
(e.g.,   those  persons  with  limited  communication  skills,' 
mental  retardation) .     Sixth,   the  interview  allows  the 
assessor  tCmodify  his/her  questions  and  responses  to  fit  the 
person/'s  conceptual/language  system  and  affords  ap 
opportunity  for  modification  of  the  interv iewep' s  verbal 
description.     This  advantage  must  be  balanced  against  the 
potential  disadvantage  of  a  nonstandard i zed  script  which  may 
promote  subject  ive  interpretations . 

There  are,  however,  a  number  of  issues  that  must  be 

<;, 

taken  into  account  in  the  use  o.f  interview  assessment 
>=^trateg  ies .    .Behavior  assessors  have  long  been  skeptical  of' 
verbal  reports,  and  with  good  reason.^  However  ,r  as  Evans  and 
Nelson   (1977)   have  observed,  by  knowing  some  of  the  possible 
sources  of  error  and  bias  in  verbal  reports  it  is  possible  to 
reduce  distortions  in  material  of ten^ unattainable  by  other 
means.    -They  provided  several  guidelines  when  dealing  with 
parental  or  adult  informants  (pp.   615-616) .  / 

Ad.ult^repQrtst^^SQifae.^cautiQns .  *  \  ^ 

1.       In  written  and  verbal   information,   factual  events 
in  the  child's  developmental  history  are  much  more 
likely  to  be  accurately  reported  than  such 
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components  as  attitudes,   feeling  states,  and  child 

rearing  practices.  ^ 

Accuracy  does  not  appear  to  be  increased  by 

repeated  questioning,  but  can  be  improved  by  such 

devices  as  diagrams  and  by  precise  statements  of 

the  information  required   (e.g.,   McGraw  &>  Malloy, 

1941)..  .  '  • 

Poor  recall   is  character istic  of  information 

related  to  : 

(a)  neonatal  injuries  or  complications. 

(b)  childhood  illnesses  (e.g.,   Mednick  &  Shaffer, 
1963. 

(c)  "  early  attitudes  regarding  arrival  of  the  baby 
(Brekstadr   196,6)  r  and 

(d)  clinic  referred  behavior  prhblerns. 
Length  of  tiire  from  the  event  to  the  Jnterv  iew^ 
not  inf  luence  the  accuracy   is-much  as  the- 
significance  of  the  event  and  the  current  level  of 
anxiety  (arousal)    shown  by  th'fe  informant. 
Distortions  are  likely  to  be  in  the  direction  of 
social  desirribiiity  (e.g.,  placing  the  informant:  in 
a  positi^-e  liqhtr  report ing  socially  accepter? 

chi lo-rear ing  practices)  .  ^ 
Mothers  may  be  more  reliable  informants  than 
fathers,  but  vAien  independent  re.v^orts  from  mothers 
and  fathers  agree,  the  information  is  likely  to  be 
more  valid .  '  • 


347 


Assessment  Bias 
340 

?•       General  characteristics  of  accurate  inforitrants  have 

not  been  identi f ied • 
8,       There  appears  to  be  no  information  to  suggest  that 
social  cfass  or  intellectual  differences  affect 
reliability  of  retrospective  reports. 
If  the  behavaor  assessor  uses  parental  reports  as 
evidence  of  the  .efficacy  of  a  treatment  program  or  other 
scientific  conclusion/  objective  corroboration  is  required 
(of.  Allen  &  Goodman,   1966;   Evans  k  Nelson^   1977) • 

Child  -  Repo r^sT  "~Otnxri  n  i-ng-JULa^^^  r  e  1  i  a b  Le  i  n  f  o  rma  t  i  o  n 

from  children  also  presents  a  challenge  to  the  behavior 
assessor.     Generally,  we  should  not  expect  that  children's 
descriptions  of  their  6v;n  behav  ior  to  be  anymore  reliable 
than  their  adult  counterparts.     In  addition  to  the  direct 
"  interviewing  of  major  socialization  agents  of  t;he  child  with 

learning  and  behav  ior  problems ,   the  chUd  sho  uld^^b^^^   ; 

considered  as  an  important,  source  of  information  during 
interview  strategies.  \ 

Although  there  are  some  technical  guides  for 
interviewing  children  ...(cf.  Garrow,   1960),  few  behavior 
assessors  have  provided  guidelines  for  this  activity.  Evans 
and  Nelson   (1977)   suggest  that  there  are  three  major  types  of 
information  to  obtain  from  an  interview  with  a  child: 

information  that  anly  a  chi Id  can  provide  regarding 
his/her  perception  of  the  problem;  \ 
likewigpr  information  that  only  a  child  can  provide 
regarding  his/her  perception  of  himself/herself;  and 

■,  .      ■     .  c  '        '      •  '  ■ 
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3,       indications  of  how  well  the  child  can  handle 

himself/herself  in  a  social  {Situation  with  an  adult 

•■>  ■  ■   ,  , 

.  An  issue  that  typically  arises  from  the  behavior 

assessor  is  whether  the  child  should  be  intervi(?wed  with 

his/her  parents  and/or  teacher(s).     On'  the  one  hand,  separate 

interviews  could  lead  to  the  child  perceiving  that  adults  are 

plotting,  some  conspiracy,  which  may  in  turn  make  it  harder  to 

obtain  accurate  information  during  subsequent  encounters* 

On  the  other  hand,   interviewing  the  parents  and/or  teacher 

and  child  together  can  provide  valuable  information,.  For 

example,  the  child  provides  the  stimulus  for  certain 

questions  and  issues,  as  well  as  an  opportunity  to  respond  to 

certain  points  raised.     Nevertheless,  there  will  probably  be 

occasion  when  a  joint  inteirview  is  avers  ive  for  parents , 

teachers,  and  child,  because  of  the  other's  presence. 

Unf ortun.:^tely ,  we  have  no  empi rical  data  to  provide  specific 

guidelines  for  such  encpunters.     From^  the  behavior  assessor'^ 

perspective,   it  would  be  ideal  to  have  an  opportunity  for  all 

conditions  (e.g.,  separate  and  joint  interviews).  Such 

opportunities  would  provide  independent  self-reported 

perception  of  the  problem,  establish  some  congruance  among 

parties'  involved,  conversely,  establilsh  some  areas^  of 

noncongruance ,  and  fina^lly  provide  an  opportuuni  ty  for  each 

informant  to  have  their  "turn"  at  providing  data  relevant  to 

the  problem.     However,   from  a  time  perspective,  such  options 

may  not  be  possible,  putting  the  burden  on  the  assessor  for  ^ 

the  "best  guess"  as  to  which  direction  to  go. 
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lQterviewiQq..tbG^Cbildls..CQers.   It  should  also  be 
mentioned  that  the  child's  peers,   in  addition  to  the  child 
himself/herself,  could  be  in terv iewed  to  gi^^her  relevant 
data.     Pe^rs  can  be  especially  reinforcing  (or 
non-reinforcing),  and  may  prove  helpful   in  h  functional 
analysis  of  the  learning  or  behavior  problCTi.     Roff  (1970) 
noted,  that  an  excellent  predictor  of  adult  maladjustment  is  a 
'    putation  as  a  child  for  b^ing  disliked  by  one's  peers. 
Thus,  children  could  be  asked  which -.chi  Idren  are  having 
learning  problems' and  what  the  possible  caused  are.     There  is 
/also  evidence  to  suggest  that  peer  selection  of  children  to 

4  1'  ■ 

fill  a  negative  role  in  a  hypothetical  class  play  is  a  useful 
d  is<;rrimination  task  (cf.  Gbwen)  Pederson,  Ba^bigian,  Rzzo  ,  & 
Trpst,;  1973)./   Cowen.et  al  .    (1973)   compared  those  adults 
apjiearing  on  a  psychiatric  register  with'matched  controls  on 
a  large  battery  of  tests  and  measures,  when  these  individuals 
were,^in  the  third  grade.     While  the  measures  included 
standardTz^cJin  tel  1  igence  and  achievement  tests  as  well  as 
^.personal  i  ty  measureis-^^d  teacher  -ratings,  the  only  measi/re 
that  discriminated  the  psychia^tric  group  from  the  con£xols  *^ 
was  the  negative,  role'  variable.     \^  *  . 

Sel  f -nR  eport^aDd>,Bebavi  or  ^CbeckIi^)^£:^^aPc3  ^Rating 


Scales .   Solf-report  is  an  indirect  assessmeh-t^  procedure 
because  i t  .represents  a  verbal  description  of  more  clinically 
relevant  behavior  occurring  at  another  time  and  place. 
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Self-report  assessment  has  sometimes  been  based  as  unreliable 
verbalizations  in  response  to  una tr uctured r  open-ended 
questions.     However  r  Q^^variety  of  self-report  inventories 
have  been  used  to  structure  assessment  (Bellgck  &  Hersen, 
1977).     Behavior  checklists  and  rating  scales  are  , 
conceptually  similar   indirect  behavioral  assessment 
strategies.     In  t,hese  strategies  ^he  child  is  asked  to  rate 
another  person  based  upon  past  observations  of  that  other's 
behavior.     Due  to  the  diversity  of  items,  that  are  included, 
the  behavior  of  actual  clinical  interest  (e.g.,  academic 
performance,  social  withdrawal)   may  or  may  not  be  involved. 
For  example,  a  teacher  may  be  asked  to  rate  a  series  of 
behaviors  in  addition  to  the  social  withdrawal  problem  (e.g., 
fear,  aggression,  academic  work).     Presumably,  other  relevant 
.  educational  problems  may  emerge  from  this  assessment.  Yet, 
the  major  featurfe  of  checklist  and  rating  scale  assessment 
strategies  is  that  the. rating  occurs  subsequent  to  the  actual 
behavior  of  interest  (Cone,   1977;  Wiggins,  r973) . 

Self-»Report .     Due  to  the  perceived  problems  inherent  in 
subjective  and  unsystematic  forms  of  sel f-report  assessment, 
various  inventories  and  schedules  were  developed  (Tasto, 
1977).     Although  sel f-repor t  measures  have  generally  been 
avoided  by  behavioral  assessors,  recent  emphasis  on  cognitive 
processes  (e  .g  . ,   Kanf er  &  Goldstein,   1975;   Thoresen  & 
Mahoney,   1974)  has  focused  attention  on  this  form  of 
measurement.     Also,  as  Tasto   (1977)   has  noted,  in  practice 
'Jthcoperatioaal-criteria-for-tbe-existeoce^of -problems-are 
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sel  fn^repor  ted  ^verbal  g  zations"   (p.   154).     For  example,  a  child 
may  report 'that  he/she  has  an  academic  problem  or  has  no 
friends.     This  repor't  (a  sel  f-perception)  •  is  an  important  and 
relevant  concern  in  assessment.  . 

Self-report  inventories  are  useful   for  at  least  two 
functions  (Dellack  &   Hersen,   1977).     To  begin  with, 
self-report  measures. can  be  useful   in  gather inq  da ta  on 
motoric  responses,  physiolog-ical  activity,  and  cognitions 
(see  Figure  7.4).     In  any^'.  par.tricul  ar  survey,   the  items  may 
tap  any  of  the  three  content'  areas  or  systems  described 
above.     Moreover,  each  of  'these  questions,  with  the  exception 
of  cognitions  (question  3)   can  be  independently  verified 
through  the  actual  observation  of  behavior. 

Another  function  of  self-report  measures  is  to  gather 
data  about  a  child's  subjective  experience.     For  example,  one 
might  ask  a  child  "Dp  you  like  math?",  "Do  you  dislike  your" 
peers?".     It  can  be  observed  that  this  second^et  of 
questions  include  subjective  components  which  are  not 
objectively  verifiable  in  the  same  way  the  first  set  is. 

Numerous  variables  may  influence  the  type  of  data  one 
obtains  '  from  self-report  and  their  cor respondence  to  the 
actual  criterion  measure  (usually  the  actual  occurrence  of 
behavior).     Such  factors  as  the  source  of  the  data  will  be 
important  (e  .g  . ,   wr i t ten  ^n  verbal  report  by' the  client),  the 
form  of  the  questions  asked,  the  content  of  the  questions, 
situational  factors ,  and  opera tional  specification  of  terms 
(Bellack  &  Hersen,   1977;  Tasto,   1977;   Haynes ,  1978) . 
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Debavj  or  ^ChQckl  is  ts  -.apd^Ua  ting  ^Scales  >     Many  formal 
checklists  and  rating  scales  have  been  used   in  the 
educational  and  behavioral  assessment  of  school  age  children. 
Walls ,  Werner ,   Bacon ,  and  Zane   (1*977)    provide  a  rather 
extensive  catalogue  of  available  scales,  as  have  other 
authors  (e.g.,   Seyerson,   1971)  •     In  many  cases,  behavioral 
assessors  use  scales  originating  from  many  different  sources* 
As  noted  above,  tbeir^uso...ip^bebavioral^a'ssessroept,,is 
pgeroised-^op^tbg^oature^pf -.tbe^d  at  a  ^gathered -.aod^bow^sucb -.data 
a r e-. used i tbe-.developmept^ of  ^ap^^jpterven tioQ^ program  > 
Behav iora 1  assessors  using  these  procedures  must,  however, 
consider  their  ipdirect  nature,  avoid  the  hypothetical 
constructs  sometimes  associated  with  their  use,  and  conduct  a 
functional  analysis  in  the  natural  environment  once  certain  ( 
classes  of  behaviors  are  identified.  / 

Several  positive  features  of  checklists  apd  rating  , 
scales  can  be  identified    (Ciminero  &  Drabman,  1978; 
Kratochwill,   1982).     First,  checklists  are  typically 
economical   in  cost,  effort,  and  assessor  time.     This  is 
particularly  the  case  in  contrasting  thes^  procedures  with 
direct  observat ioni^  of  behavior  in  the  natural  environment. 
Second,  many  checklists  are  structured  so  that  -a^~relat  ively 
comprehensive  picture  of  the  problem  can  be  obtained. 
However,  such  measures  usually  provide  a  very  global  picture 
of  behavior.     Third,  due  to  the  diverse  range  of  questions 
asked   in  typical  checklists  and  rating  scales,  the  behavior 
assessor  may  be  able  to  identify  problems  that  were  missed 
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through  other  assessment  methods  such  as  in  direct 
observation  and  interviewing.     Fourth^  data  obtained  from 
checklists  and  rating  scales  are  usually  (relatively)  "easy" 
to  quantify  (as  though  factor  analysis /mult i-d imensional 
scaling,  latent  trait  procedures).     In  this  regard^  they  have 
bepn  useful   for  classification  of  various  behavior  disorders 
(cf.  Quay/  1979).     Fifth,  checklists  and  rating  scales 
frequently  may  provide  a  useful  measure  for  pre-and  post-test 
evaluation  of  an  intervention  program.     Sixth,  such  measures' 
are  frequently  a  convenient  means  of  obtaining  social 
validity  data  on  therapeutic  outcomes   (cf.  Kazdin,  1977; 
Wolf,   1978)  . 

A  number  of  considerations  must  be  taken  into  account  in 
the  use  of  rating  scales  and  checklists.     Conceptual  and 
methodological  issues  have  toeen  raised  over  their  use  in  both 
research  and  practice  (e.g.^  Anastasi^   1976;  Ciminero  fir 
Drabman,   1978;    Fvans  &  Nelion.   1977;  Kratochwill,  1982; 
Severson,   1971;   Spivack  &   Swift, \l973  ;  Walls  ct  al.,  1977). 
A  major  problem  with  these  procedures  is  that  they  represent 
an  1  Dd  1  r ect iroepsi on >-of-> assessment .     Since  data  are  gathered 
retrospectively,  their  relation  to  actual  occurrences  of  the 
itarget  behavior  in  the  natural  environment  may  be  less  than 
perfect.     Second,  while  it  appears  that  rating  scale 
constructions  have  sc5m~e"  cri~t:er-ia-,-:fAr_geneTa  items 
included  in  the  scale,   the  rational  may  not  be  evident  or 
remains  unspecified.     In  this  regard  it  is  not  always  clear 
how  items  may  relate  to  each  other.     Third,   it  is  frequently 
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unclear  under  what  conditionf»  .iie  scale  should  be 
administered  (e.g.,  at  what  time  after  observing  the 
behavior).     Fourth,  there  is  often  no  clear  rationale  for  the 
manner  in  which  rating  scale  constructors  rate  the  presence  ^ 
or  absence  of  a  particular  kind  of  behavior  and  the  kinds  of 
cateqor'       employed  to  code  various  behavior  scales  and 
checklists  are  also  characterized  by  considerable  variation 
within  a  particular  scale  with  regard  to  the  kinds  of 
judgments  required.     Sixth,  a  large  number  of  rating  scales 
are  constructed  to  detect  the  presence  of  negative  behaviors 
or  problems  (i.e.,  behavioral  excesses  and  deficits)    and  less 
frequently  focus  on  positive  behaviors  (assets).  Finally, 
many  published  scales  fail  to  meet  standards  for  reliability, 
validity,  and  norming   (cf.  Walls  et  al . ,  1977). 

Rating .scales  and  checkl is ts  . wi 1 1  likely  continue  to  be - 
used  extensively  in  behavioral  assessment  in  schools  and 
other  applied  settings.     A  continuing  reason  for  their 
popularity  typically  relates  to  the  general  ease  with  which 
such  devices  are  administered   (but  not  necessarily 
interpreted).     Nevertheless,   the  rating  scale  and  checklist 
user  should  consider  the  af orementpioned  conceptual  and 
methodological  1  imitations  if  he/she  wishes  to  reduce  bias  in 
the  assessment  process. 

Self ^Monitor gpg .     Self-monitoring  refers  to  the  act  of  a 
child   in  which  some  occur  rence(  s)   of  hi"s/her  behavior  are 
discriminated  ai^  then  recorded..     This  procedure  is  regarded 
as  a  dirfect  assessment  procedure  in  that  behavior- is  r'ecoird^d 
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at  the  time  of  its  actual  occurrence.     Several  major  sources 
have  provided  a  review  of  the  applications  of  sel frmonitor Inq 
(e.q.r  Cimineror  Nelconr  &  Lipinskir   1977;  Haynes^ 
1978-Chapter  9;   Kanfer  &  Phillips,   1970;   Kazdin,  1974; 
Mahoney,   1977;  McFall,   1977;   Nelson,   1977;  Watson  &  Tharp, 
1974;  Workman  fir   Hector ,   1 978 ) .     Self-monitoring    (SM)   can  be 
used  for  both  assessment  and  treatment  of  various  problem 
behaviors.     Its  use  in  assessment  and  treatment  involve 
somewhat  different  considerations. 

SelfmMopitQriDq,.Assessroeot .     When  SM  assessment  is 
employed,  data  on  the  ch i Id ' s  behav ior  are  useful   for  at 
least  two  reasons.     First,  the  client  may  be  requested  to  SM 
during  the  initial  stages  of  educational  assessment  when  the 
prof essiorfal   is  attempting' to  identify  specific  problems!  In 
this  regard,  baseline  response  levels  help  verify  the 
existence  of  a  problem.     SM  may  also  be  used  to  gather 
information  on  how  successful   the  intervention  program  is. 
The  range  of  application  of  SM  to  various  target  behaviors 
has  been^quite  extensive  and  the  interested  reader  is 
referred  to  the  references  listed  above  for  examples. 

Many  different  recording  devices  and  methods  have  been 
used  for  SM  assessment.     Some  of,  the,  more  common  include 
record  booklets,  checklists,   forms,  counters,  timers,  meters, 
measures,  scales,   residual  records  (e.g.,-  empty  pop  bottles)  ^ 
archival  records  (e.g.,  telephone^ bills) ,  diaries,  and  many 
others .  .  *^ 

^  When"  SM  is  used  for  assessment,  a  number  of  var^ablgg 


Assessment  Bias 
349 


influence  the  reliability  and  validity  of  the  data.  Both  the 
accuracy  and  reactivity  of  SM  have  been  identified  as  factors 
influencing  the  data   (see  Table  7.7),     The  accuracy  of  SM 

depends  on  the  following  10  factors   (McFall,   1977,  pp. 
200-201  )  :  ^ 

1.  Training.     Qhi Idren- should  be  trained  in  the  use  of 
SM,  -   Training  will  generally  resul  t-  in  better 
accuracy  and  increjase  the  credibility  of 
assessment . 

2.  Systeroati  cornet  bods Systematic  SM  methods  will 
usually  result   in  more  reliable  and  accurate^ 
measure^  than  those  that  are  more  informal  and 
nonsys tema t ic . 

3.  '"-Cba  racter  i  s  tics  ^of^  the  ^SM^device  .  .   A  SM  device 

which  allows  s imple  data  col lection ^  and  which  does 
not  depend  heavily  on  the  child's  memory  will 
usually  provide  more  accurate  data   in  assessment. 

4.  Tiroipq ,     in  general  ^    i..e  closer   in  time;  the  actual 
SM  act  is  to  the  occu..rence  of  the  target  behavior^ 
the  more  likely  the  data  will?  be  accurate-. 

5.  Response-^Cprapetition .     When  a  child   is  required  to 
monitor  concurrent  responses,  his/her  attention  is 
divided.     This  may  cause  interference  and  thereby 
reduce  the  accuracy  of  the  SM  data. 

ERlc:  •■  ,.,  ;  357.   \  :       ... : 


Assessment  Bias 
350 

Table  7.7  . 

Factors  Influencing  Self-Monit.9icing  Assessment 


Factor 

Accuracy 

.  Reactivity 

1. 

Training 

1. 

Motivation 

2. 

Systematic  methods  j 

2. 

Valence 

Dimensions 

3. 

Characteristics  of  moni-^ 

•3; 

Target  behaviors 

toring  device  ^ 

4. 

Goals,  reinforcement, 

4. 

Timing 

and  feedback 

5. 

Response  competition 

5. 

Timing 

6. 

Resj^hse  effort 

6. 
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BgspoQse^ef f or t ,     The  more  effort  (i.e.^  time  and 
energy)   the  child  must  spend  on  the  SM  activity, 
the  less  accurate  the  data  may  be, 
Reinf  orcerpent  •     Contingent  positive  reinforcement 
for  accurate  recording  will  usually  increase  ^ 
accuracy.     Some  extern^al  criterion  can  also  be 
established  for  accuracy  improvement;.  \ 
Aware ness^ of accuracy., assessiiieot.     The  professional 
should  monitor  the  child's  data  and  hiake  him/her . 
aware  that  accuracy  is  being  monitor-ed.  Such 
awa reness  will  usually  increase  accuracy .  • 
Sel  ecti  on  >.Qf^  tar  get  ^behaviors .     Since  some 
behaviors  are  more  salient,  more  easily 
discr iminated ,  or  more  memorable,  variations  in 
accuracy  wi 11  occur  as  a  function  of- these 
dimensiohs.     Generally,  higher  levels  of  accuracy 
have  been  .establ^hed  on  'motor  behaviors  <e.g.,' 
head  touches)    than  verbal  (behaviors  (e.g.,  number 
of  times  tlie  person  says*  "you  know")    and  positively 
valued  behavior's^  aire  more  accurately  recorded  than 
those-  that  ^re  ne^gatively  valued . 

Characterdstics-,Q£-.cbild  J    'some  children  are  more 
accurate  recorder s  than  othexs One  would 
generally  expect  young  children  to  be  less  accurate 
than  older  chi  Idren*,  adolescents  and  adul.ts. 
However ,v  indiv idual  variations  wi^l  occur  within 
ages* 
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Reactivity  may  also  be  problematic ^Chen.SM  is  undertaken 
because  unintended  or  unwanted  influences  caused  by  the  act 
of  recording  yield  data  that  are  not  representative  had  SM 
/not  been  used.     McFall    (1977,  pp.   202-204)    presented  eight 
variables  that  should  be  considered : ^  ^  , 

1.  Motivation.     Children  who  are  motivated   to  qhange 
their  behaivor  prior  to  engaging   in  SM  are'  more 
likely  to  demonstrate  reactive  effects. 

2.  Valence .     Depending  on  how  children  value  a 
particular  SM  behavior,   it  may  or  may  not  change. 
Generally,  positively  valued  behaviors  are  likely^ 
to  increase,  negatively  valued  behaviors  are  likely 
to  decrease;  and  neutral  behaviors  may  not  change. 

3.  Target^bebaviors .     The  nature  of  the  target 
behavior  for  SM  may  influence  reactivity.  Also, 
the  number  of  target  behaviors  monitored  at  on.e 
time  may  produce  di fferent  reactive  effects. 
Sometimes ,  two  behaviors  being  monitored  may.be 
more  reactive  than  one. 

4.  Goals^  -.reinf  orceroeot^-.apdl.f  eedback  .  Specific 
performance  goals,  feedback  and  reinforcement 
scheduled  as  part  of  SM  will   increase  reactivity. 

5.  Tiroing .     React iv ity  may  vary  as  a   function  of  the 
timing  of  SM.     As  the  time  between  the  natural 
occurrence  of  a  behavior  and  the  recording  pf  the 
behavior   increases ,  reactviti ty  may  decrease. 

6.  S^lf ■>%roonitQriog-^devices .  >   Generally,   the  more 
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obtrusive  the  recording  dev ice^  the* more  reactive 
it  tends  to  be  {e  .g . ,  a  hand  held  timer  is  more 
reactive  than  one  that  is  out  of  oight  and 
"a war eneo s'';  . 

7.  Number  >- of  ^target  ^bebavi  ors  >     As  the  numbdr  of 
target  behaviors  being  moni tored  increases ^  '  ^ 
react ivity  may  decrease. 

8.  Schedule^of  ^self-r^roooitoriog ,     Continuous  SM  may  be 
more  reactive  then  intermittent  SM. 

The  10  accuracy  ^v-^^ables  and  the  eight  reactivity 
variables  may  be  problematic  in  assessment.  .  When  SM  is  uSed 
as  an  intervention  somewhat  different  concerns  must  be 
considered .  * 

Self -^Moni tor ing-.as-an.^Lnterveotion .  '  Self-monitoring  is 
frequently  used  as  a  .therapeutic  technique  and  it  often  has 
been  used  as  one  t:omponent  of  a  more  complete  system  of 
behavioral  sel f-control    (Thoresen  &  Mahoney,  1974) .     A  totgl  ^ 
program  might  include  the  following  components :  (a) 
sel f-assessment  where  the  child  examines  his/her  own  behavior 
and  determines  whether  or  not  he/she  has  performed  certain 

.       to       '  I 

behdviors;^  ( b)  sel  f-moni  tor  i-ng  ^  (c)  sel  f-determination  of 
reinforcement  wherein  the  child  determines  the  nature  and 

amount  of  reinforcement  he/she  should  receive  contingent  upon 

■  ■   ■  ■  ■  . .  /  ■ 

the  performance  of  a  given  class  of  behaviors and   (d)  / 

sei     administration  of  reinforcement  iwherein  the  child  / 

d  jlspenses' hi s/her  own  reinforcement  (self-determined  or  not) 

contingent  upon  per f ormance  of  a  given  class  of  behaviors 
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(Glynn,  Thomas,  &   Shee,   1973).  / 

When  SM  is  used  as  an  intervention,  accuracy  and 
reactivity  take  on  quite  different  roles.  .  Accuracy  plays  a 
minor  role  in  fostering  therapeutic  change ^since  regardless 
of  whether  or  not  children  monitor  accurately,  SM  may  produce 
a  positive  behavior  change.     While  reactivity  is  something  to 
minimize  in  assessment,   it  is  usually  fostered  to  maximize 
therapeutic  change.     Not  all  react iv i ty  may  be  therapeutic- 
ally desirable  and  the,  professional  must  arrange  condition;i  ^ 
so  as  to  facilitate  positive  reactive  change. 

"  -  '    •       ■     .V '  ■    ' ' 

Rosa t ivG-.Cbaracteri St ics^of^SM.     Despite  some  potential 
methodological  limitations,  SM  may  be  advantageous  for 
behav ipral  assessment  fot  several  reasons.     First,   it  is  a 
relatively  cost-efficient  means  of  assessment  relat ive  to 
such  techniques  as  observational  assessment.     However,  the 
prof es'sional  must  take  into  considerations  such  factors  as 
the  training  time  and  monitoring  in  such  a  cost  analysis. 
Second,  SM  may  be  the  only  assessment  option,  as  in 
measurement  of  private  behaviors  (thoughts)  d     Third,  SM  can 
minimize  the  sometimes  obtrus ive  effects  of  assessment  that 
occur  with  other  assessment  procedures  (e.g.,  interview, 
direct  observation) .     Fourth,   SM  can  help  verify  the 
existence  of  a  problem   in  combination  with  other  assessment 
methods . 

&naloque^A.ssessroGOt .  Another  direct  assessment  procedure 
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requires  clients  to  respond  to  st imul i  tha t  simulate  or 
approximate  those  found  In  the  natural  environment.     In  such 
assessment  analogues  the  child  is  usually  requested  to  role 
play  or  perform^  as  if  he/she  were  in  the  na.tural  environment. 
Analogue  assessment  procedures  have  been  used  for  many  years 
within  behavior  therapy ,  but  it.  is  only  recently  that      ^  ^ 
systematic  features  have  been  outlined  and  advantages  and 
disadvantages  considered   (cf.   Haynes ,  .197 8 ;  McFall  ,  19^77; 
Nay,   1977);     Relative  to  direct  naturalistic  assessment, 
analog ue  methods  offer  several  posit ive  contributions. 
First,  particularly  in  resee.rch^  they  permit  increased 
opportunities  for  experimental  control.     This  positive 
feature  may  also  emerge  .when  analogue  assessment  is  being 
used  for  clinical  purposes.     Many  variables^  operating  in  the 
natural  environment  contaminate  assessment  efforts  and  a  more 
analogue  assessment  may  reduce  these.     In  this  regard  the 
professional  may  be  able  to  gain  a  good  perspective  on  the 
problem  free  from  Some  of  the  contaminating  factors  usually 
present  in  the  natural  setting   (e.g . ,  classroom)  .     Second , 
analogues  may  reduce  the  amount  of  distortion  that  sometimes 
.occurs  when  an  observer  is  present  in  natural istic  settings . 
Third,  analogue  assessment  may  , allow  assessment  of  behaviors 
whicfi  .^re  impossible  to  monitor  in  naturalistic  settings. 
Four th  ,"^relat  ive  to  direct  observational  afssessment 
procedures,  analogue  strategies  may  be  less  costly  on  several 
dimensions..   .Fifth,  analogue  assessment  may ,  help -s  impl  if  y  and 
reduce  complex  problems.     Through  analogue  assessment  we  may 
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be  able  to  control  extraneous  influences,  isolate  and  • 
manipulate  specific  variables,  and  reliably  measure  their 
effects.     Sixths  analogue  assessment  procedures  may  help 
professionals  avoid  certain  ethical  problems  that  emerge  in 
natur  a'' ^  St  ic  observation.     Thus,  under  analogue  assessment 
^conditions  the  'professional  may  be  able  to  test  a  procedure 
to  learn  about  its  characteristics  prior  to  implementing  it 
in  the  natural  environment. 

Five  categories  of  analogue  methods  have  been  identified 
by  Nay   (1977);  paper  and  penci  1  ,  aud io tape  ,  videotape, 
enactment,  and  role  play  analogues.  '  Paper  and  pencil 
analogues  require  the  child  to  note  how  he/she  would  respond 

V 

to  a  Stimulus  situation  pr'^sented  in  written  form.  For 
example ,  ^eachers  may  be  asked  to  respond  to  a  series  of 
multiple  choice  quest  ions  which  depict  different  options  to 
follow  in   implementing  behavior  management  procedures .  In 
paper  and  penci 1  analogues  the  stimulus  situations  are 
presented   in  a  written  mode  with  responses  options  written, 
verbal,  and/or  physical.     The  child  is  usually  preisented  the 

stimulus  and  a  cue  for  a  response  is  made.  ^  The  response  made 

*  ^  ... 

ma%  be. verbal   in  that  the  child  is  asked  to  describe  what 
he/she  would  do  ancJ/or  physically  rq§pond  a^s  he/she  typically 

would.     Wh i le *  a  * maj or  advantage  of  these  procedures  is  that 

\  *•■•.,''  1.  '  ■ 

they  can  be  given  to  large  numbers  of  children  at  the  s>me 

time  and   that  they  are  easily  quantified ,   the  predictive 

utility  of  these  procedures  usually  remains  unknown. 

Moreover,  this  type  of.  rnea-sure  is  limited  because  the 
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professional  does  not  observe  overt  behavior  in  response  to 
the  actual  stimulus. 

Audiotape  analogues  present  stimulus  items  in  some  type 
of  auditory  format.     Some  characteristics  of  these 
procedures  include  a  set  of  instructions  to  the  child  and  a 
series  of  audio  situations  presented  by  the  professional. 
The  child  is  typical ly  required  to  make  a  verbal  or  other 
physical  response.     For  example,  the  professional  may  presenl!^ 
audio  transcripts  of  a  teacher  presenting  information  to  a" 
class  of  school  age  children.     The  child  may  be  requested  to 
respond  through  role  play  or  f ree  behav ior  .     Althpugh  the 
audio  analogue  shares  many  of  the  advantages  of  the  paper  and 
pencil  analoguer   it  may  not  approach  real  is tic  st imul us - 
conditions.  ^  T  ' 

The  videotape  analogue  uses  video  technology  to  pres,ent 
a  relatively  realistic  scene  for  the  child.     In  this  regard 
it  can  closely  approximate  the  naturalistic  setting.  Most 
often. both  audio  and  visual  components  are  used.  Video 
analogues  can  a  1 90  be  used 'f. or  training- intervention  ,  as.  in 
the  teaching  of  social  or  academic  skills.     Cost  and 
avaiTability  of  the  video  equipment  represent  major 
Tim stations  of  this  procedure.  ' 

'  Enactment  analogues  require  the  child  to  interact  with 
relevant  stimulus' persons  (or  objects)    typically  present  in 
the  natural  environment  within  the  contrived  situation. 
Sometimes, the  professional  may  bring  relevant  stimulus 
persons  (elg .  ,  peers ,  teachers)    into' the  assessment  setting 
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to  observe  child  responses,  as  has  been  done  in  assessment 
and  treatment  of  select ive  muti sm   (Kra tochwi 11 ,   Brody ,  & 
Pierselr   1979)  •     A  maj'or  advantage  of  this  approach  is  that 
stimuli  can  be  arranged  to  be  nearly  identical  to  the  natural 
environment.     Yet,  a  limitation  of  this  procedure  is  that  the 
situatiorl  may  still  not  duplicate  the  natural  environment . 

The  role-play  analogue  can  be  used  within  ^he  context  of 
any  of  the  aforementioned  assessment  procedures .     Sometimes  a 
script  is  presented  and  the  child  is  asked  to  covertly 
rehearse  or  overtly  enact  cer ta i n  behav iors  under  various 
stimulus  situations.     A  professional  may  ask  a  student  to 
role-play  asking  a  teacher' ?  assistance  to  assess  various 
preacademic  skills.     The  child  may  play  hirnsel  f/hersel  f  or 
someone  else^     Specific  instructions  may  be  present  or 
absent.     Flexibility  in  format  is  a  major  advantage  of  this 
procedure  as  is  the  option  for  direct  measurement  of  the" 
beh'avioral  responses.     As   is  characteristic  of  other  analogue 
assessment  procedures ,  a  major  d i sadvan tage . i s  the  potential 
lack  of  a  close  match  between  the^  analogue  pnd  the  natural 
env  i  ronment . 

The  analogue  assessment  procedure  presents  many  behavior 
assessment  options  in  educational  settings.  Nevertheless, 
both  reliab'^il  ity  and  validity  issues  need  ,  to  be  addressed 
when  these  procedures  are  Used    (Nay,   1977).  Therapis^ts 
employing  these  procedures  should  assess  reliability  data  on 
target  responses.     A  check  on  the  validity  of  the  analogue  is 
made  by  comparing  the  contrived  assessment  with  the  target 
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behaviors  occurring  in  the  natural  environment*     As  is  true 
of  other  assessment  proced ures ^  ana logue  assessment  may  best 
be  used  as  one  of  several  techniques  to  assess  behavior, 

pi rect^Qbservatlonal^ Assessment >   Direct  observational 
assessment  is  a  most  commonly  used  procedure  in  behavioral 
research  and  .practice.     Jones,   Reid,  and  Patterson  (1975) 
summarized  three  majpr  characteristics  of  a  "naturalistic 
observational  system" ,   including  recording  of  behavioral 
events  in  their  natural  settings  at  the  time  they  occur not 
retrospectively;  .the  use  of*  trained   impartial  observer- 
coders,  and  descriptions  of  behaviors  which  require  little  if 
any  inference  by  observers  to  code,  the  events"  (p.  4.6)  • 

Observational  assessment  stra teg ies  are  commonly 
affiliated  wi  th  behav ioral  approaches  (e  .g Johnson  & 
Bolstand,   1973;   Jgnes,  et  al .  ,   1974 ;   Kent  &  Foster 1977; 
Lipinski  &   Nelson,   1974)   but  are  not  limited  to  this 
orientation.     They  are  used  in  rather  diverse  areas  or 
psychology  and  educatiort  (e.g.,  Boehm  &  Weinberg,  1977,;. 
Cartwright  &  Cartwright,   1^74;  Flanders,  1966,  ,1970;  Hunter, 
1977;   Lynch,   1977;  Medly  &  Mitzel,   1963;   Rosenshine  &  Furst , 
1973;   Sackett,  1978a,   1978b;   Si tko ,  Fink,  &  Gillespie,  1977; 
Weick,   1968;  Weinberg  &  Wood,   1975;  Wright,  1960). 

The  rather  extensive  literature  in  this  area  does  not 
allow  a  thorough  presentation  (see  Haynes,  1978   for  more 
detailed  coverage  in  behavior  therapy).     When  used  in 
educational  arid  psycholo^g ical  assessment,  many  issues  emerge 
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over  the  utility  of  these  procedures   .     One  major  issue  in 
its  use  in  clinical  assess:nent  is  the  distinction  between 

observational  procedures , and  actual  observation  instruments 

/■  , 

(cf.   Kratochwill,  et  al.Vl980).     Most  professionals  have  used 
some  type  of  observational  procedures  in  their  assessment 
work.     This  may  take  the  form  of  their  direct  observation  of 

I 

a  child  in  a  classroom  or  having  a  parent  or  teacher  record 
the  occurrence  of  some  behavior.     Figure  7.5  presents  an 

example  of  a  record  form  used  by  a  special  education  teacher 
who  had  an  aide  observe  a  child  civer  a  one  week  period. 
While  observational  measurement  procedures  may  vary 
considerably  under  a  number  of  dimensions  (e.g.^   the  , person 
observi-ng,   the  target  response^  the  sophistication  of  the 
form),  they  are  most  commonly  used  as  part  of  a  more  general 
assessment  bat.tery.  '   y  ^ 

In  contrast  to  these  observational  procedures  ^  thiere  are 
relatively  few  speci  f  i c  observational ^instruroents  in  us;e  in. 
behavioral  assessrftent.     The  paucity  of  instruments  for  direct 
observational  assessment  is  likely  due  to  the  lack. of 
attention  to  the  development  of  these*  scales  and  the  need  to 
design  assessment ' forms  for  specific  situations  and  problems 
(Mash\  Terdal,   1981).-  .  . 

Among  the  instruments  that  have  been  developed^  most 
focu^on  a  rather  specific  range  of  behayior5  (e  .g . ,  ,  . 
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Figure  7.5  -  An  example  of  a  multiple  behavior  recording  format 
used  for  direct  observational  assessment.  Numbers 
across  the  top  of  this  sample  block  indicate  10-second 
observe,  10-second  record  intervals.    Target  behaviors 
for  the  child,  and  peer  are  listed  down  the*  left  m^^irgin 
Problem  behavior  (target) -for  the  child  are  coded  as  . 
Talking  (T),  throv;ing  objects  (TO)  and  out  of  seat  {0). 
Problem  and  desirable  cbiTcl  behaviors;  are  mutual ly 
exclusive  in  any  one-lO-second  interval. 
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Alevirosr  DeRissir  Li berman ,  Eckman ,  &  Cal\ahan>  1978; 

O'Leary,   Romanczykf   Kass.   Dietz,  &  San  Tagrossi,  1971; 

Patterson,   Ray,   Shaw,  &   Cobb,   1969;  Wa>   er,   House,  & 

Stambough,   1976)       Each  of  these  codin  ?  systems  is  used  in 

different  settings,   including  institutional  program 

8 

evaluation   (Aleviros  et  al . r   1978)    ,  home   (Patterson  et  al • , 

9  :     .     ■  10 

1969)    9   school    (O'Leary  et  al . ,   1971)     and  home  and  school 

11  .  '  ^ 

(Wahler  et  al  . ,   1976)    .     Each  of  these  systems  represents  a 

r  ' 

promising  observational   instrument  for  assessment  Ir.  research 
and  practice  (see  Ciminero  S-^Drabman,   1977  for  a  brief 
bverview)  • 

A         .     ■  ■ 

Direct  observational  measurement  is  usually  th^ 
preferred  method  of  assessment  in  therapy  and  Iresearch .  Yetr 
the  number  of  methodological  issues  tha t  have(been  raised  in 
recent  years  has  made  this  assessment ' procedure  comple 
(cf.  Ciminero  &  Drabman,   1977;   Gelfand  &  Hartmann,  1975;- 
Haynes,   1978;   Johnson  &   Bolstad,   1973;   Kazdin,   1977;  .IJent  fir 
Fosterr   1977;  Wildman  &  Erickson,   1978).     From  the  available 
literature  has  come  some  recommendations  that  can  make  this 
form  of  assessment ' more  credible. in  practice  ( some  specific 
factors  that  bias  this  form  of  assessment  were  reviewed  in 
Chapter  5).     Fi  rst,'  ind.iv  idual  s  functioning  as  observers 
should  be  well-trained.'    Training  should  include  samples  of 
behavioral  sequences  and  environmental  settings  wh>ch  closely 
resemble  the  behaviors  and  settings  an  which  actual  data 
collection  will  occur. 

Second,  'two  or  more  observers  sh6'uld  be  involved  in 


AssGT^sment  Bios 

assessment  efforts  to  establish  interobserver  agreement  on 
the  response  measures.     Observers  should  be  trained  together 
and   the  scores  co.mpcired  with  a  single  formal  cr  iter  ion ,  and 
training  shou'.d  be  long  enough  to  ensure  that  there  is 
agreement  to  a  specified  criterion  on  each  ^de. 

Third,   the  conditions   for  assessing  observer  agreement 
should  be  maintained  to  insure  consistent  levels  of 
agreement.     Continuous  overt  monitoring'  and  covert  monitoring 
may  help  generate  stable  levels  of  agreement   (cf.  Wildman  & 
Erickson,   1978)  . 

B'ourth,  observer  bias  can  be  reduced  by  not 
communicating  the  specific   intervention  pl^n  to  the  . 
observer(s)o     Possibly,  explicit  instructions  to  the  observer 
indicating   that  the  specific  outcomes ^are  unknown  may  be 
preferable  to  completely  avoiding  the  topic. 

Fifth,    in   the  absence  of   instruments  or  coding  sheets 
for  a  target  problem ,   specific  observational  codes  should  be 
constructed  so  that  behaviors  can  be  easily  rated.  The 
professional  should  typically  be  conservative  in  the  number 
of  codes  that  are  to  be  rated. at  any  one  time* 

-   Six th,  operational  definitions  should  be  constructed  for 
each  specific  behavior  to  be  observed.     Definitions  should 
also  be  tested   to  ensure  that   two   independent  observers  can 
obtain  and  maintain  high  levels  of   interobserver  agreement.  ^ 

Seventh,,  observations  should  be  conducted   in  an 
unobtrusive  fashion.     To  assistin  the  examination  of 
obtrusiveness ,  data  should  be  monitored  for  evidence  of 
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reactivity  or  bi^^s. 

Kighthi  measuremen t^of  the  generality  of  observational 
data  across  different  settings  sho-uld  be  conducted.  While 
direct  observations  should  take  place  in  the  settings  in 
whioh  the  target  oeh^vior  has  been   identified,  multiple 
assessment  across  behaviors  and  settings  wi 11  ^further 
elucidate  the  extent  of  the  problem  and  help  monitor 
intervention  effects.  •  ^ 

Finally,   normative  data  are  quite  desirable  in  many 
cases  and  should  be  considered   in  observational  assessment. 
Normative  data  can  help  objectively  identify  behavioral 
excesses  and  deficits   in  a  given  client    (Hartmann  et  al., 
1979;   Nelson  &  Bowles,  1975). 

While  direct  observational  measures  will   likely  remain 
an    important  procedure  within  behavioral   assessment,  much 
work  remains  to  be  done  to  make  this  fi'im  of  assessment  less 
expensive,   less   time  consuming,  and  iVti^re  versatile.  Because 
this  strateqV  involves  less  inference  about  a  particular 
behavior   irelaC^ve  to  many  traditional  assessment  practices, 
and  because  it  emphasizes  a  repeated  assessment  of  the  child 
across  various  pha^i^^of   intervention,   it  should  be  used  as 
often  as  possible. 


r 

Rsycbopbysi olog ical>.A.ssessment .  Behav  ior  assessors  have 
increasingly  focused  on  psychophysiological  measures  of 
behavior   in  part  due  to  the  growing  emphasis  on  the  three 
response  systems,     'psychophysiological  measurement   is  defined 
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as  ''the  quantification  of  biological  events  as  they  relate  to 
psychological  variables"    (Kallm'an  &   Feuerstein,   1977,  p.  « 
329).     The  rise  in   interest  in  psychophysiological  measures 
•in  child'  assessn^'ent   is  also  due-to   increased  sophistication 
in   instrumentation,    increased  use  of  biofeedback  and  other 
behavioral'  procedures  to  treat  phychpphysiolog ical  disorders, 
and   the'  finding  that  independent  measures  of  physiological 
responding  do  not  correlate  per f ec tl y  wi th  verbal  reports  and 
overt  behavior,   thus  making   it  increasi,ngly  desirable  that 
they  be  used   (Ciminero  &  Drabman,   1978).      It  is  generally 
recognized  that  psychophysiological  assessment   is  in   its  very 
early  stages  of  development  relative  to  other  assessment 
procedures.     However,   their  use  in  schools  has.  been 
particularly  limitetd.     Psychophysiological  assessment  in 
schools  and  other  settings  necessitates  the  consideration  of 
several    issues    (Kallman  n   Feuerstein,.  1977).     First,  due  to 
their  complexity  and  exjbense,  physiological   recordings  should" 
provide  data   that  cannot'.be  obtained  as  reliably  and 
c-fficiently  as  by  other     procedures.     Second,  within 
c?nucational   settings,  psychophysiological  assessment  should 
provide,  the  professional  with   information  about  the  selection 
and  evaluation  of  an   intervention  strategy.  Third, 
psychophysiological   assessment  procedures  should  possess 
adequate  reliability  and  validity  for   their  use. 

Several  classes  *of  problems  hav^  been   identified  when  . 


physiological  measures  are  .use<^  (Hersen  &  Barlow,  1976). 
These  factors  mFy^fed^^  of  this  form  of 
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assessment.     Firsts  equipTient  used  to  monitor  physiological 
responses*  is  sometimes  characterized  by  mechanical  failure, 
SecQrid ,   the  professional  usinq  physiological  measurements 
must  allow  time  for  adaptation  during  various  phases  of  . 
assessment.     Sometimes^  physiological  measurement  is 
initially  reactive  and  this,  effect  must  be  eliminated  so  that 
the  effects  of  the  intervention  can  be  separated  from  the 
effects  of  reactivity  alone.     Wher  physiological   responses  . 
ate  repeatedly  measured,  habituation  and  adaptation  may  be 
problematic.     In  this  regard,   the  effect  of  the  intervention 
must  be  distinguished  from  mere  habituation  or  adaptation  to 
recording.     Third,  vari:)us  assessor  and  contextual  variables 
may  interact  with  the  physiological  measures.     Fourth,  when 
-various  physiological  response  systems  (e.g.,   GSR,  iieart 
rate,  blood  pressure,  etc.)    are  used  as   indices  of  emotional 
^.rousal,   the  specific  err^otion  experienced  by  ehe  child  cannot 
be.  assumed   to  occur   ia  the  absence  of  a  self-report 
confirmation  from  the  child       Finally ,   there  appears  to  be 
some  evidence  for  individual  differences'"  in  autonomic 
r^^activi  ty.     For  example,  differ-.-nt  peripheral  autonomic 
sys^-.ems  may  'how  low  or   inconsistent  correlations  across 
clients  (Zuckerman,  1970). 

As  with  ether  behavioral  procedures,  psychophysiological 
assessment  can  provide   important  information  for  the  design 
of  intervention  progr£   s.     Yet,  the  increased  use  in  school 
settings,  will  likely  be  slow  based  on  cost  and  efficiency 
considerations. 
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Cr  i  terion-^Uefc:t^  ..Assessroeot 

Gr  i  ter  ion- ref ererfceci  assessment  has  been  closely  aligried 
with  but  not  limited  to  the  behavioral  paradigm  (Bijou^  1976; 
Cancelli  &  Kratochwill,   1981) •     Since  cr i ter ion- referenced 
tests  were   first   introduced    (Glaser       Klaus,    196,2)  -continued 
clarification  of  the  term  as  well  as  issues  that  must  be 
addressed   in  its  use  have  proliferated   (cf,  Hambleton., 
Swaminathan,  Algina,  &   Couison,   1978).     In  the  early 
literature  cr  i  ter  ion--ref  erenced   tests  were  considered  precise 
measures  of  highly  specific  discrete  behavior  capabilities. 
Such  behaviors  were  purported  to  be  hierarchically  sequenced,, 
as  derived  through  task  analysis  procedures   (cf.  Gagne,  1961, 
1968;    Resnick,  Wane.   &  Kaplan,   1973).     Glaser    (1971)  provided 
an  early  definition  of  a  cr i ter ion- referenced  test: 
A  cri  ter  ion-^referenced   test*  is  one  that  is 
-deliboratly  constructed  to  yield  measurements  that 
are  directly  in terpretable  in  terms  of  specified, 
performance  stajidards.     Performance  standards  are 
generally  specified  by  defining  a  class  or  d>m9.in 
of  tasks  that  should  be  performed  by  the  . 
individual.     Measurements  are  taken  as 
r epresen te 1 1  ve  samples  of  tasks  'drawn  from  the 
doma  in  and  such  measurements  are  referenced 
directly  to  this  domain  for  .each  individual  (p.- 
41)  .  ^  . 

Wi  thin  this  conceptUa  1  i  za  tion^— t-t^e  term   —  — — 

domain-referenced  test  has  evolved.     Thus^  whether  one 


Assessment  Bias 
368 

I 

prefers  the  term  cr  i  ter  ion- ref  ei;enced   (Hambleton  et  al., 
1978)   or  domain-referenced    (Subkoviak  &  Baker j  1977),   it  is 
generally  assumed  that  the  concept  of  "domain"   is  imp^liod. 
Nevertheless,   these  notions  about  cr i ter io n-referenced . tests 
have  evolved  outside  a  behavioral  orientation.     It  appears 
most  useful   to  consider  that  per f ormance /on  a  criterion-. 

referenced  test  is  a  function  of  the  immediate  test  situation 

'J 

and   the  previous  interactions  that  comprise  the  history -of 
the  child   (Bijou,   1976).     Speci fic  responses  to  items  on  a 
cr  i  ter  ion-referencod  test  may  be  due  t^o   (a)    the  naturae  of  the 
test  items,   and    (b)    the  setting  factors  in  taking  the  test. 

Behavioral  asse.^sors  > -^ve  noted  that  cr  i  ter  i9n-/doma i  n- 
referenced  tests  are  CvrratTy  improved  with  an  empirical 
validation  of  honrce-.-^s   v  t.eiri  doma  ins   (e.g.,   Bergan,  1978; 
Crin«     i:-',   1978;   Dayton  &  Maci'eady,   197.6;  Macready  ^&   Merwin,  * 
1973)       Ho  v:ver  .until  recently,  procedu:^:es  for  establishing 
hoir.c  our  jous   iter    domains  have  no.t'been  available  (e.g.,. 
ia-tent  structure  analysis).     With  ^the  development  .o  f 
procedures  for  empirically  validati-ng  the  scope  and  sequence 
of  domain;  of  homogeneo'us   items,  a  new  form  of  . 
criter  ion-- /..oma  in- referenced  assessment ,  '  labeled 
path-referenced  assessment   (Bergan,   1978,    19R0)    ha <: '.been 
dQvelope^a;     This  assessment  procedure  provides  iaformation 
about  the  client/learner  which . allows  specific  identification 
of  skill  and/or  domain  deficiencies  as  well  as  the  sequence 

.j^^^  „  .      pa  if  H Jt  )"0  f~c  t!  r-i^i:e'U  I'Um-H  n  s  t-r-uet4"S     t ft        ill   1  e  ad-mo  s  t 
efficaciously  tojv.astery  of  the  task  identified  ^ 
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Criterion- /domain-referenced  tests  have  generally  been 
used  for  three  purposes  within  educatioiial  settings  (Bijou, 
1976):      (a)    to  diagnose  problem  behav ior , , ( b)    to  monitor 
learning,   and   ( q)    to  assess  read  iness  for   placement   in-  a 
pre'sctibed  educational  program.     A  central   theme  in  their  use 
\is  that   they'm^^sute  a  chi Id' s  competence   in  a  particular 
Vrea  and  assist  in   the  diesign  of  a  specific  instructional 
program.     Yet,   several  criticisms  of  cr i ter io n-/doma i n- 
xeferenced  assessment  have  emerged  uelated  to  the  lack  of 
normative  comparisons   in   the  assessment  activity  (e.q.,   Ebel  , 
1970;   liofmei.ster  r   1975)  .     Bas^d  upon  the  common  use  of 
cr iterion^/domain-referenced  assessment,   such  measures  do  not 
^pyfovide  normative  data       a  characteristic  deemed  desirable  by 
some  prof 6SS ior al 5 . 

A  response  to  this  issue  of'  normative  data  must  take 
',into  considerat ton  that  no rm-ref erenceo  and  criterion- 
^    ^/domain- referenced  tests  are  really  designed  for  different- 
Vurposes.     Items  on  a  cr  i  t-et  io  n-ref  erenced  tests  are 
typically  selected  randomly  trom  each  domain  during  test  :. 
construction  while  psychometic  theory  governing  the  selecti:.n 
•  of  items  f9r  norm7referenced  devices  suggests^  that  in  order 
to  disci  iminate  between  good  and"  poor   learner's,,  items  which 
are  passed  by  half  a  sample  of  the  population  are*  best 
(-ubkoviak'&   Baker,   1977).     Individuals  desiring  normative 
information  from  cr i ter ion-/domai n- referenced  assessment 
should  consider  the  use  of  social  val idat ion  as  an 
alternative  to  psychometr ica 1 ly  es tabl ished  norms  (cf.  . 
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Kazdin,   1977;  Wolf,  1978).     This  extension  involves  social 
validity   (Kazdin,   1977),  a  procedure  which  refers  to 
assessing  the  social  acceptability  of  some  intervfentiorj..^ 
Wolf  and  his  associates  suggested  that   interventions  be 
socially  v/alidated    (e.g.,  Maloney,   Harper,  Braukmann,  Fixsen 
Phillips,  &  Wolf,   1976;  Minkin,   Braukman,   Minkin,  Timbers, 
E'ixse'n,   Phillips,  &  Wolf,   1976;   Phillips,  Phillips,  Wolf,  & 
'Fix,Sen,.  1973;  Woif,   1976)  .     Kazdin   (1977)   reviewed  several, 
facets  of  social  acceptability:  .  . 

Initially,   the  acceptability  of.  the  focus  of  the 
intervention  can  be  assessed.     This  aspect  of 
social  acceptability  refers  to  whether  the 
behaviors  selected  are  important  to  individuals  in 
t'he  natural  environment.     Second,   the  acceptability 
of 'the  procedures  can  be  assessec3.  Presumably, 
many' procedures  might  alter  behavior   (e.q.,  .  . 

reinforcement  of  a  particular  response,   time  out, 
shock),.     Acceptability  of,  or  consumer  satisfaction 
wi^th,   the  procedure  can  be  determined  and  used  as  a 
.basis  for  selecting  among  effective  techniques. 
Finr,  lly,   the   importance  of  ,the  bebavior-cbaage 
achieved  with  treatment  can  be  validated  by 
examining  the  change  in  light  of  the  performance  of 
the  nondeviant  peers   in  the  environment  or  through 
evaluations  by  indiv iduals  in  everyday  contact  with 
/the  client  (p.   430)  . 
An  important  component  of  social  validation  involves 
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determining  whether  behavior  change  is  clinically  relevant 
for   the  child/     One  way  that  t'his  might  be  accomplished   is  in 
assessing  the  child's  f una t i on i ng   in  the  environment  after 
the  academic  performance?  has  been  achieved.     In  this  case^ 
validatioii  of   intervention  effects  can  be  accomplished   in  two 
ways:     Namely,  social ^comparison . and  sub ject i ve^evaluation  * 
FJoth  oC   these,  involve  .somewhat  different  considerations  and 
methods •  '  .        .  * 

Social  comparison  requires  that  the  professional 
identify  individuals  similar   to  the  child   in  subject  and 

r 

demographic  variables,  but  who  differ   in  performance  on  the 
target,  behavior   (e.g.,   knows  the  multiplication  tables  1 
through  123).     Kazdin   (1  977)    suggested   two  ways  for  this 
asses.sTient  to  be  conducted.'     Fi  rst  /  assessment  of  a  target 
behavior  is  determined  to  be  deficient  and  therefore  warrants 
an   interveat ion /     Second,   the  level  of  performance  of  peers 
who  do  not  warrant  an   intervention  could  serve  as  the 
criterion   for   the   intervention  on  the  deviant  child.  Thus, 
if  the  intervention   is  effective,   the  child's  academic 
behavior  should   fall  within  the  normative  level  of  peers. 
Research  reviewed  by  Kazdjn   (1  977)    suggests  that  many  .appl ied 
intervention  programs;  whose  target  behaviors  involve  social 
behavior  patterns  ha've  successf ul/ly  used  social  comparison 
st::dies.     For  example,  O'Connor     ]     2)   developed  social  #  , 

interaction  in  nurser}>  school  children  with  m'ocJel  i  ng  or  ^ 
modeling  combined  v;ith  shaping.  -^Prior  "^,)t,  isolate 

children  were  below  the  level  of  their  noi.  ■  ^  peers  in 
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such  behaviors  as  f>>coxinnity  to  others  and  visual  and  verbal 
contact  with  peers.     After   treatment,   social   interaction  in 
the  classroom  of  the  trained  children  surpassed  the  level  of 
their   non-isolate  peers,  an  effect  that  was  maintaine'd  up  to 
six   weeks  of   follow-up.     These  results  suggest  that  the 
magnitude  of  change  was  clinically   important  in  that  setting. 
There  appears  to  be  fewer   studies  to  provide  good  examples  of 
social  validation  ofacademic  skills. 

Subjective  evaluation  of  intervention  effects  consists 
I    of  judgments  about  the  qualitative  aspects  of  performance. 
Presumably  the  academic  per formance  that  has  been  altered  can 
be  observed  by   individuals  who  are   in  the  natCiral  environmertt 
with  the  child    (teacher)    or  who  are   in  *a  special  position 
through  training  and  prof ess ional   skills   (e.g.,  special 
education  teacher,  psychologist)    to  judge  the  behavior.  This 
form  of  evaluation  is  quite  common   in  applied  behavioral 
research.     For  example,   subjective  evaluations  have  been  used 
with  reinforcement  techniques  designed  to  alter^  compositional 
responses  of  elementany  school  children   (e.g.,   amount  of 
writing,   use  of  words  not  previously  used,  varied  sentence 
beginnings).     Sub j ec t i ve      a  1 ua t i ons  by  adults  including 
teachers  or  college  students  have  revealed  that  compositions 
completed  after   training  are  rated     qualitatively  better  than 
those  completed  during  baseline  (e.g.,   Br  igham,'  Graubard  ,  & 
Stand)   1972;  MaloneySr   flookins,    1"973;   Van  Houten,  Morrison,  . 

■  r  .  " 

'  .       ..  ; 

Jarvis",  &   McDona.ld,   1974)  o     Both  social  comparison  and 

sub jective' evaluations  can  be  employed  to  evaluati?  treatmetjit 

EjIc    :    ,    :■   '  /.  :-. :.  :  :3Sn    '  i 


Assessment  Bias 
373 

effects. 

Social  validation  represents  an  important  alternative 

t 

for  those  individuals  wishing  to  evaluate  the  effects 
intervent  i^is  in  appl  ied  settin  ;?> ,     Moreover^  such 
evaluations  provide  an  alternative  to  conventional 
norm- referenced  tests.     However,   these  procedures,  are  not 
without  t-,heir  problems.     Normative  standards  may  be  an 
inappropriate  criterion  against  which  to  evaluate?  change.  As 
Kazdin   (1^77)    has  noted^  a  goal  might  even  be  to  change  the 
normative  level.     For  example^  one.  of  the  authors 
(Kra tochwi 1 1)    has  worked  with  teachers  who  argue  that  reading 
and  many  readiness  skills  should  not  be  taught  in 
kindergarten.     In  this  situation  the  goal  would  be  to  achieve 

0 

a  .^?v;  level  that  would  be  desirable  for  both  the  children  and 
for  teachers   in  later  ^grades  . 

As  Kazdin   '1977)   has  noted: 

. .  .cl^^ssroom  appl  ir;=^  t  ions  might  bring  the  academic 
performance  of  a  s.   '      ^  up  to  the  levfel  of  his  or 
her  peers.     While  this  would  be  a  successfnl 
intervention  in  some  sense,  whether  normative 
levels  should  ever  serve  as  a  standard  might  be 
questioned.     Normative  levels  of  academic 
performance  in  most  classrooms  can  be  readily       ^  - 
accelerated  wi  th  reprog ramming  teacher  behav  ior  and 
'     curricula   (p.  439). 

The  same  issue  can,  of  course,  be  raised  with  any 
normative  standard.     The  ij?^::  >':at  normative  levels  of 
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perfoi^mnnco  as  alcriterion  tor  evaluatinq  chanqe  implies  a 
sati  r.raction  with!  these  levels.     Nevertheless,   the  issue  is 
that  many  people  working  with  the  client  in  the  natural 
environment  would  Bccept  average  behavior,  esp,ecia!l.ly  if^  i^t 
has  previously  been\  dev  iant . 

Identifying  thq  normative  group  may       30  be  difficult 
for  some   individuals.     While  it  might  be  expected  that  a 
child  of  normal  measured   intelligence  would  be  able  to  count 
from  1-1H0,   this  goal  may  be  unrealistic  for  a  mentally 
retarded  child.     It  conld  also  be  somewhat  arbitrary  to 
specify  those,  ind iv iduals  who  constitute  the  normative  group. 
Should  a  Me X ican-Amer ican  child  have  the  same  normative  group 
as  his/her  angle  peer?     As   Kazdin    (1977)    has  noted,  simply 
defining  ones'    "peers"   or  the  normative  group  hinders  many 
'    variables  that  might  be  relevant  for  judging  intervention 
effects.     The  professional  might  want  to  take   into  account 
such  factors  as  age,   SES,   10,   and   family  environment.  ^ 

Use  ^of  -.Ttrid  1  tiopal  ■^Assessmert-D  evi  ces     p  ^Rebavl  oral 
Assessment ,     Much  has  been  written  on  the  limitations  of 
traditional  assessment  practices,  both  wi'thin  personality 
(e.y.,    iiorsen  &   Barlow,    1976  ;  Mischel,    1 9  68 )  '  and  ab  i  1  i  ty 
testing  approaches   (e.gr,   Bersoff,    1973;  Rijou  &  Grimm,  1975; 
L-^or>n,    l^''^;    Kra  tochwi  1 1  ,  .  1 977  ;   Mann,   1971;    Salvia  &  ,  ' 

Ysseldyke,   1978;   Ysseldyke,    1973).     Behavior  assessors  have 
typically  rejected  various  standardized  tests  of  ability  and 
have  instead  tended  to  argue  for^^J^^ptx^  use  of  criterion- 
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rcCeronced  tests   (cf.  Livir.  >ston,   1977)   and  task  analysis 
procedures  (Mercer       yr>sel(J^  V'^ ,  1978). 

As  was  mentioned  earlier^  one  of  the  aims  of  traditional 
tests   is  to  predict  which  children  might'  be   in  need  of 
special  education  services,     A  major  limitation  of  IQ  tests 
(and  other ^ instr uments  used  to  diagnose  l?arninq  problems)^ 
for  the  professional   interested   in  developin,  intervention 
programs   is  that  the  constructors  of  the.  'Co  we're  really 

concerned  with  large  group  prediction.     0;  tbe  main 

purposes  of  psychoed  ucat  iona^  behavioral  ar>r,f  •  .ment  should  be 
with  confounding  these  predictions  by  ni'-^^iiig   the  child's 
problems,     Binet  developed   the  mental   :         as  a  screening 
device  so  .that  "feeble-minded"  children  could   receive  special 
education   in  t^e   Parisian  schools.     However,  what  v/as 
significant  is  that  no  one  seems  to  know  how  successful  they 
were  (i.e.,  did  those  children  end  up  better  off  than  just 
setting   it  out  at  the  bottom  of  the  regular  classes?)'  Also 
noteworthy  is  that  the  research  comparing  the  academic 
achievement  of  children   in  special  education  classes  vers.us 
requlaxr  class  has    /ielded  equivocal   results  (e.g.,  Blatt, 
1958;  'Cassidy  &   Standon,    1959;   Goldstein,  Moss,  &  Jordov, 
1965).     Moreover,   it   is  the  conceptual   shift  from  prediction 
to  potential   that  has  obscured  the  d  i  sad  van  t  aged  ,  rclce  , 
intelligence,   compensatory  education  debate  (Cronback, 
I97i3*b)  .     Nevertheless,  the  notion  of  using  the  IQ  score  as  a 
measure  of  "intellectual"  potential   is  strong  despite  the 
fact  that  the  pred ict ive  val id i ty  of  the  IQ  score  for  most 
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sfx^inl  populations  is  largely  unknown. 

What  use  can  the  behavior  assessor  have  for  the  IQ  test? 
Evans  and  Nel son ■ ( 1 977 )   suggest  four  qualities  that  might  be 
consider'ed:     Namely,   the  standardi  zation  feautre,  goal 
setting  for  remediation  by  behavioral  methods,  assessment  of 
items  not  learn  J  at  school,   and  context  of  assesement. 
Others  have  noted  some  positive  features  of  traditional  10 
tests  as  well   (e.g.,   Ciminero  &  Drabman,  1978). 

Standard  ized-Quali  ties .     Evans  and  Nelson   (1977)  suggest 
that  the  standardization  feature  of  tests  allows  definition 
of  one's  target  popul ation , relative  to  others.     This  may  be 
relevant  in  evaluating  the  outcome  of  intervention  programs. 
A  second  point  is  that  standardized  test  data  also  allow 
evaluation  of  the  substantive  significance  of  a  behavioral 
program    (cf.   Nelson,    1974).     This  appears  to  oe  an 
improvement  on  some  behavioral   interventions  that  have  tended 
to'  report  outcome  data   in  the  form  of  changes  on  some 
arbitrary  spale,  the  mean ing fulness  of  which  is  unknown. 
Staats  (1971;   1973)   suggested  that  standardized  test  scores 
provide  an  additional  source  of  data  against  which  to 
evaluate  statistically  the  success  of  a  behavioral  program. 
While  the  standardisation  sample  could  be  conceptualized  as  a 
large  control  group,   Evans  and  Nelson   (l'977)   note  that  the 
statistical  problems  inherent  in  this  strategy  are 
considerable  and  would  require  knowledge  of  the  reliability 
of  the-  test  for  the  population  from  which  the  treated 
children  were  drawn.     Moreover,  the  test-retesl:  Reliabilities 
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typically  are  low  for  the  kinds  of  special  children  treated  ^ 
by^ behav ioral ' procedures .  ^  .  "  »      .  . 

Goal..S.ettinq .     A  second  adv.antage  of  standardized  tests 
that  has  been  raised  is  that,,  given  well-constructed  age 
norms,   they  can  reveal  an  area  of  deficit  and  thereby  help 
set  academic  goals  for  remediation  by  behavioral  methods 
(Bijou,   1971).     Evans  and  Nelson   (1  977)    suggest  two  problems 
with  this  approach.     First,  one  problem  is  to  ascertain  by 
how  much  a  score  on  a  particular  subtest  has  to  deviate 
before  the  child  can  be  thought  to  have  a  seri'pus  deficiency 
in  the  area.     The  answer   fo  this   is  again  related  to  a 
statistical   issue  regarding  error  of  measurement  of  the 
individual   subtests  and  the  scatter  of  the  scores  obtained. 
A  second  problem  relates   to  the  test  item.     A  child  who  does 
^poorly  on  visual   sequential  memory  from  the  ITPA  will  likely 
be  referred  for   training   in  "v i sual  memor y" ,  but  as  we  have 
noted  above,   the   implication  for   reading   instruction  will 
likely  be  tenuous.     Moreover,   item  con tent  f or  pr ed i ct i on  may 
not-  relate  'to   instructional  goals.  ■  Unfortunately,  despite 
fhe  fact  that  test  constructors  insist  that   10  tests  should 
not  be  'used  as  tests  of  cognitive  abilities   (cf.  Wechsler,.  H 
1  975),   they  continue  to  be  used  for   these  purposes,. 

1  tern -Content .  "  Many  standardized  te^sts  of  ability  (e.g^ 
10)    i'nclude/assess  )?nowledge  of  items  not  learned  exclusi^rely 
'at. school.     Thus,  another  use  o'f  traditional  ability  tests 
would  be  that  .they  allow  one  to  compare  what  the  child,  has 

•  •  •  ^ 

learned  generally  (as  measured   in'an   IQ  score)    with  what 
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he/she  has  learned  at  school    (e.q.,   some  score  on  a  school 
achievement  t^st)    (Fvans  d   Nelson,.  1977).     Evans  and  Nelson 
(1  977)    no'te  that  a  statistically  significant  difference 
between  the  two  measures  and  with  the  achievement  measure 
lower,  would  suggest  remediation  of  rather  general  classroom 
learning  and  studying  skills.     Unfortunately,  there  are  some 
problems  with  this  reasonable  suggestion..    First,  there  is 
some,  over  lap 'between  the  two  measures,   so  for  a   test  of 
ability/achievement  differences  oAe  would  have  to  vsort  out 
specific  items  for   further  analysis.     Second,  a  statistically 
significaht  difference  may  not  ^be  a  meaningful  difference. 
Finally,  a  point  recognized  by  Evans  and   Nelspn   (1977)  is 
.that  typical  .achievement  measures  are  sometimes  so  general 
that  remedial  efforts  could  be  quite  misdirected  even  if  they 
di?3  correspond  to  a  curriculum  in  the  classroom,  which  they 
typically  would  not. 

The  argument  that  one  could  copnpare  scores  on  one  10 
test  with  those  obtained  from  a  more  "culture  reduced"  test 
to  estimate  the  degree  of  deficiency  in  skills  specific  to 
the  dominant  culture  seems  quite  reasonable  (cf.    Eyans  & 
Nelson,   1977)."   For  example',   a  test  administered   in  both 
standard  and  nonstandard   English  could  yield  a  discrepancy 
that  would  allow  the  professional   to  determine  if  the 
minority  group  child  had  a  "cognitive  deficiency"  or  a 
limit^  knowledge  of  standa^   Engliali   (Quay,  1971). 
Nevertheless,   to  hypothesize  a  cognitive  deficiency  may  not 
be  as  useful  as  determining  wl^at,  skills  (e.g.,  on  some  test 
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of  ac^ijj^ic  skills)    are  ^deficient  under  different  .J.anguaqe 

conditions.     However,   che'  notion  that  standardized  assessment 

/ 

under  different  stimulus  and  response^  conditions  is  an 
alternative  assessment  strategy  has  been  rai'sed  and  could 
provide  the.  behavio-ral  assessor  usbful   information  in., 
planning  an  intervention  program  (cf.   Kratochwill,   1977).  As 
Evan»^a}d  Ncijson   (1977)    observe,  "...showing   that  a  child* 
from/a  -different  culture  fails  a   test  presented   in  one  way 
(the  typical  V-testern  European  fashion)  ,^but  passes  a  similar 
test  presented   in  another. v/ay  (using  more  familiar  stimuli)  , 
is  an  assessment  of  the  importance  of  those  stimulus^ 
variables   for  a  given  task.".'(p     640)    (see  also  Cole,  Gay, 
Click.   S   Sharp,   197  l;_;Pr'ice-Wi  1 1  iams  ,   1966;   Pi  er  sei  ,  Bro'dy  , 

Kratochwill ,   1977) . 

Testipq-Coptext .     Children  develop  a  set  of  complex 

skills. which  are  employed   in  varying  .degrees  during  the 

■  .  ...  ■•  .  i  ' 

administration  of  a  standardized  tes^,     Evans  and  Nelson 

.  (1977)    note  that  what  can'  be  a  major  problem  for  t^he 

comparison  of  test  scores  across  cul tur es ,  subculture?, 

ethnic  groups ,  or  ^ocial  classes  can  befUSeful.to  the 

professional  because^  the  testing  situation  represents  an' 

opportunity  to  observe  the  child's  style  of  behavior  on 

cognitive  tasks.     Never thele(ss,  they  sugge^/  that  one  major 

limitation  of  this  procedure   is  that  such  observational 

categories  are  subjective  and  frequently  no  reliability 

measures  are  taken.     While  some  scales  are  speci  f  ica'l  ly 

developed  for  this  purpose  (-e.g.;  ,  Sattler's  (.1976)   Behavior  ' 
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and  Attitude  Checklist  for   IQ  Testing!,    the  psych61oqist 
would  need   to  construct    specific  scales  for  different  tests. 
^       A  second  problem  is  that  many  of  the  tasks  c.ra  not  well 
designed  to  tap  the  child's  problem-solving  strategy.  For 
example,   it  would  be  difficult  to  com'pletely  analyze  the 
problem:;;fcolv  ing  activity  of  a  child  con;ple'uinq  the  WISC'-R 
block  design  without  inclusion  of  covert  verbal  statements 
that  accompany  performance  actions  .  ^ 

Thus/  it  mus t  be '  s triessed  that  within  a  behavioral  , 
ana,lysis  of   IQ  test  per  f  ormance.  ( and^>  any  test  generally) 
is  a   fil^nction  of  the  test  situat*ion  and  a  child's 

interactional   history  (Bijou,    1976).^    Within  the  test 

,  "ft  • 

situation,   performance  will  be  a  function  of  the  test  items 
and  setting  factors  in  test  taking. 


Assessment  RicT?5, 
38 1 

Surorpiir  y  -;.aud...CQijclusaou 

In  this. chapter  we  reviewed  some  al  ternat  ives  that  have  been 
oposed  to  traditional  assessment  practices>     TradiMonal  assessment 
:actices  have  u.sually  involved  a  relatively  standardized  battery  of 
jsessmeht  such  as  standardized  I.O.   and  personality  measures.?  A 
imber  of  alternatives  have  been  proposed,   including  culture-reduced 
jstinq,  renorming,  adaptive  behavior,   Piageti^n  assessment  procedures, 
?arning  potential  assessment,  diagnostic  clinical   teaching,  child 
?velopTient  observation ,  and  clinicVl   neuropsychological,  assessment.   In  . 
ich  of  these  areas  we --found  a  rather  limited  amount  of  empirical 
^search  addressing  the  issue  of  how  these  strategies  can  actually 
?duce  bias   in  assessment.     In  some  cases  there  are  conceptual  and  ^ 
2thodologigal  problems. in  the  research.     In  other  areas  there  is  no 
brpng  evidence  to  suggest  that  any  of  these  things  result  in  better 
3rvices  to  children  a's  a   function  of  their  inclu'sion   in  the  assessment 
cocess.     In  order  to  address  this  concern,  tnore  empirical  research 
=aking^into  account  these ,  d  i  fferent  alternatives  needs  to  be  conducted'. 

A^  rather  extensive  discussion  of  behavioral  assessment  strateq ies 
as  incLuded  in  the  chapter  bq^cause  thes^  techniques  ha^ve  been 
^latively   ignored   in  the  test  bias  literature.     Indeed,   there  has  been 

paucity  of  information  relating  behavioral  assessment  techniques  to 
n  expanded   frWwork  for;  assessment  to  ^duce  bias.     After  reviewi/g  a 
onceptual   framework  f or' behav iora 1  assessment  ^echn iques  ,  we  provided 

review  of  specific  techniques  including,   interview,   self-report  and 

'  .-  -  \ 
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|>ehavioral  checklists  and  rating  scales,   sol  f-mon  i  tor  i  ng  ,  analogue 
assessment,  direct  observational  assessment  /  psychophysiolog ical 
ssessment,  c ri ter io n- referenced  assessment,  and  the  use  of  more 
traditional  assessment  procedures  within  behavioral  assessment. 

Behavioral  assessment  methods,  on  the  surface,  3o;k^Tike 
procGduros  that  are  relatively  useful   in  expanding  of  framewor.k  for 
'assessment   in  educational  settings  with  minority  and  non-minority 
|childrGn.     One  of  the  major  advantages  that  some  of  these"  procedures  ' 
■have  (e.g.,*'direct  observational  measures)    is  that  they  can  be  us^d 
lover  different  phas^es  of  the  assessment  process  to  determine  .how 
[effective  services  are.     This  feature  of-  behavioral  assess.tient  is 
perhaps  one  of  the  stronger  characteristics  that  needs  to  be  emphasized 
|in  future  assessment  work.     In  addition,  criterion-referenced 
assessment  holds  great  promise  for  work  in  this  ^rea;  howe- as  noted 

this  section  of  the  Chapter,   there  are  numerous  conceptual/method- 
ological  fecitures  of  this  form  of  assessment  t^at  haVe  not  directly 
addressed  the   issue  of  bias  in  assessment. 
I  "Again,  we  must  conclude  that  ev^n   in  the  behavioral  assessm.ent 

area-  where  there  is.  strong  promise  of  red.ucing  bias  in  the  assessment 
IprocVss,  little  empirical   research  has  been  conducted  on  th  i-s  specific 
topic.      indeed,   the  field  of  behavioral  assessment  aacks  any  conceptual 
framework  fc«r  dealing  with  test  bias  in  a  systematic  way.     One  of  our 
|s.trong  recommendations  is  that  the  task  qf  providing  a  conceptual 

framework  for  research  must  be  undertaken  in  the  future. 
I    Finally,  although  a  number  of  alternatives  to  traditional  .  „ 

■ assessment  have  been  proposed,  at  this  time  it-  is   vikely  that  these 

ERIC  •  ^ 
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pcocc!durorj  should  be  considorod  as  adjunct;?  to  more;  'trnd i tional^ 

»'  ■■. 

lassossmont  until  data  indicate  that  any  procedure,  ox  a  combination  of 
pcocoduccs,  can  make  a  stronq  contr  ibut:  ion  to  reducing  bias  in  the 

assessment  process.  .  ' 
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\'  Ethical  and  Legal  Considerations 


Assessment  and  treatment  of  children's  learning  arid  emotional  problans 
m:\;c\s5;itntes  a  discussion  of  ethical  and'leqal  considerations  esoecially  as 
they  relate  to  the  assessment  of  culturally  and  linguistically  diverse 
populations.      Specifying  that  a  child  has  a  learning  or  htehavior  problan 
raises  Issues  of  labeling  and  the  possibility  of  professional  intervention 
(e.g.,  consultation,  special  education  services).    Once  a  judgment  has  l^een 
made  that  cx  prol>Lem  exists,  some  professional  may  beicom^  involved  in 
attempts  to  assess  anr]  treat  the  problem.    Sc^etimes,  depending  on. the  type 
Of  case,^a  research  investigation  may  also  be  considered.    Any  of  theoe 
procedyres  involve  something  intrusive  for  the  child  and  may  expose  him/her 
to  a  range  of  risks  and  possible  inconveniences.,    A  child's  participation 
•    in  assessment,  treatjnent,  and/or  research  may  involve  the  failed ng 

potential  intrusive  influences:  (a)  privacy  of  the  child  <and  parent)  may 
be  involved;   (b)  personal  resources  (e.g.,  time^  money)  may  be  used;  (c) 

personal  autoncmy  may  be  sacrificed;  (d)  the  client  (and  family)  may  be 

\   .  ■         «^  • 

exposed  to  physical  and/or  psychological  pain  and  discomfort;  (3)  permarxent 
physical  and/or  psychological  damage  may  occur  (Stuart,  1981).    Due  to 
these  potential  negative  influences,  various  guidelines^  laws,  and  moral 
codes  have  been  developed.  In  thjs  chapter       review  the  .ethical  and  legal 


issues  relevant  to  non-biased  assessment  of  children  experiencing ^learning 
and  behavior  problems.    These  considerations  are  presented  within  the  ^ 
context  of  assessment,  treatment,  and  research  v'ith  school  age  populations. 
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rthiC()t  c-md  LorjcTl  Issues:    Tlio  Context 
•nie  asr.oj;f3iiont  proceflurou  reviewer]  irt  C'iirlior  clvaptors  of  for  sbmo 
promise  lor  achicrving  various  theraix?utic  qoals  in  work  witb  children 

,  oxix-riencinq  lonrnincj  anc3  tehavior  problems.  .  Yot,  the  very  fact  that  these 
proca'lurcT,  can  Ix?  uspcl  *to  change  feelinqs,  cognitions,  anc1  behavior  of  the 
child  and  his/her  family  raises  numerous  concerns  over  .the  relationship 

■between  professional  and  client(s) ..   Concern  for  the  nature  of  a 
therapint-client  relationship  is  not  new.'    Perhaps  since  .the  days  of 
Hippocrates  individuals  have  been  sensitive  to  the  special  nature  of  the 
relationship  that  exists.    \«}hen  one  person  goes  to  a  designated 
professional  for  help,  he/she  is  vulnerable  to' potential  abuse."    Over  the 
years,  scjK)lars,  professional  groups,  and  the  courts  have  raised  issues 
.over  the  nature  of  assessment  and  its  potential  impact  on  the  consurer. 
Three  sources  of  guidelines  '(influence)  have  been  established  for 
professionals  involved  in  ther^peOtic  assessment  and  intervention  (Stuart, 
19R1) :•  (a)  law,  (b)  ethics,  and  (c)  morality.    These  influences  provide  a 
conceptual  guide  for  the  professional  involved  in  assessment,  intervention 
^and  researci)  with  minority  and  nonminority  children. 

I^aws        .  '  ;  •  ,      ^  • 

'    The  future  of  many  assessment  and  therapeutic  programs  for  children 

■  *.        *  * 

experiencing  acadanic  and  social  disturbances  is  increasingly  being 
influenced  by  legal  precedents  established  in  the  judicial  system.  Laws 
provide  one  of  the  strongest  influences  on  professional  behavior.  . 
Penerally,  lav/s,  vJheLher  by  statute  or  case,  represent  formal  princioles 
that  govern  conduct,  action,  or  procedure.    In  this  sense  they  can  be 
viewed  as  guidelines  for  activities  amonr  — ofessionals.    In  fact,  laws 
establish,  in  seme  cases,  who  can  ba  s  -rote.ssional  by  judging  or  who  can 
call  themselves  a  physician,  lawyer,  psychologist,  counselor,  and  so  forth. 
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UiforlUrintoly,  arj  Stuart  (1901)  nntly  notes,  laws  have  typically  boc^n 

r 

pro5ic:riptiv(?  rathor  than  prescriptive.    Tliey  have  typically  !>oociCied 

.     sanctionr,  and  pc^naltios  for  misconduct  rather  than  e^ablished  quiclelines^ 

for  |T)sitivo  and  acrrontable  activities.    Anothor  characteristic  of  laws  is 

that  they  have  typically  been  reactive  rather  than  proactive.    In  most 

car,os  l.av;s  have  come  into  of  feet  subsequent  to  some  misconduct  or  misdeed 

rather  than  Ijeir^i  enactcxl  to  prevent  various  problens,  althour|h  they  may 

prevent  future  behavior  of  the  same  kind  (e.q.v  discriminatory  assessment 

practices).    For  these  reasons,  laws  can  be  regarded  (^s  somewhat  incomplete 

*  - ,  •  •  .  • 

guidelines  for  professionals.    Although  their  formal  estattlishnent  may 

represent  a  potent  source  £>f  influence,  thi$  may  be  too  specialized  to 

provide  the  kind  of  information  the  professional  needs  in  his/her  everV^ay 

assessment  and  theraneutic  activities.  «       7  ^ 

Ethics  and  Rights 

—  ^  i  • 

.Ethics  are  sometjiing  that  nearly  every  orofessidnal  agrees  should 

have,  but  agreement  over  the  definition  and  scope. of  ethics  ranains  a 

f  .      .       •  • .   .  . 

continual  source  of  controversy.    Morality  is  often  implied  in  definitions 
.    1'  ,  .  > 

'    of  ethics  as  this  dictionary  definition  implies  (cited  in  Krasner,  1976,  p. 

n        ■  '■ 

631):  ' 

'•Tlie  stirly  of  the  general  nature  'bf  morals  and  of  the  spefcif  ic 

'   moral  choices  to  be  made  by  the  individual  in  his  relationship  • 

with  others  ...  The  rules  or  standard  qoverrlinq  the'conauct  of  the 
"       .  -  ,  f    .      •  ' 

memters  of  the  profession  .y.  Any  set  of  moral  principles  or 
values  ...  The  moral  quality  of  a  course  of  action;  fitness; 

propriety  (^^merican  Heritage  Dictipnary,  p.  450).''  .      \  \ 

J. 

^Within  this  context,  the  assessor/clinician  miust  make  decisions  on  the 
basis  of  what  is'goC)d  and'bad  for  the  specific  incJividyal,  as  implied  in 

^  '  \  394  ». 
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(Joficut  ions  o.  iiW.il  ily  (Krasnor,  l')7r,).     Tn  tluj.so  (N<:i.sionr;,  thp  trrvilmont 
i.ssnos  of  control  airl  prediction  of  hx-^havigr  (imorqe.    Brnun  (1975),  for 
oXt-YnjVo,  raisal  the  followirri  concerns  with  lx?havior  modification:  "Wio 
shall  have  the  FX)wi>r  to  control  Ix-havior?";  "1\Dw.ircls  what  end  fihall  the 
control  line)  pov^-^r  Y>e  uscxJ?"  'Mbw  shall  tho  power  to  control  behavior  be 
rcxiulatcx]?"    Such  issues  are,       course,  not  specific  to  behavior .  therafPYf 
i\ix\  can  Ik:^  .stxHi  to  (^nerqo  in  any  of  the  asnef^sment  and  intervention 
approaches  used  for  the  provision  of  children's  special  educatit)nal 

services.       '  .  ^" 

Sometiines  a  distinction  i^  made  betveen  ethics  and  human  rights 
(f-torris     Brown,  19R2)    For.  ex^ple,  i'f  the  means  used  to  assess  or  treat  a 
child  with  a  rtevere  behavior  disorder  v^s  intolerable  to  this  child  (e.g., 
□  ocxlirvj,  punishnent,  role  playing,  ancf  s(he)  elected  not  to  be  inv^olved  in 
the  proqtram,  a  rights  ouestion  would  emerge.    In  contrast,  the  professional 
may  become  involved  in  an  ethical  decision  when  deciding  which  assessment/ 
intervention  or ocfedure  would  work  with  the  spscific  ^type  of  problem 
experienced  by  the* child*    Additional  concern  may  arise  when  assesslirlg 
minority  group  children.  For  example,  certain  minority  group  children  may 
.feel  less  canfortable  in  testing  situations  (see  Chapter  5)  .and  maV  have 
their  rights  violated  more  easily  than  nonminori-ty  group  children. 

It  is  obvious  that;  rights  and  .ethics  overlap  in  practice.  Trtius, 
failure  to  consider  a  hinan  rights  issue  would  be  regarded  as  unethical. 
Yet,  the  manner  in  which  ethical'  artd  human  rights  codes  provide  guidelines 
fdr  professional  behavior  vary  and  are  not  often  uniform  across  disciplines 


(e.g.,  psychology,  special  education) . » 
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Moral  principle's^  represent  an  influence  on  professional  behavior 


A5;r.or.nnont  Mirm 

in<ijamr:h       l;lu»y  provide  fjuiflf.'S  for  corviu-t  that  transcorKl  riixxrilic  law.; 
aiyl  dhical  rcxlcr;.    Moral  prinripl or;  roh^r  to  "...riorno  abr^oluto  nnsunptionr; 
nl>oiU:  the  riqlitf;  and  re;;ponsibil  i  tier,  of  individual  5^  (Stuart,  l^ni ,  p. 
717)"    As  noted  aU)ve,  morality  pTay;;  a  central  rplo  in  c^thical  quidolinon. 
No  a.snenfanent  .stratoqy  or  thornpcjutic  model  In  frc?o  frcin  ncrutiny  on 
ethical  and  moral' grounds,  'but  tho  techniqvies;  derivorl  from  any  fXarticular 
approach  do  not  imply  a  particular  othieal  or  mor  j1  approach.  IVinrlura 
(1909,  p.  112)  raises  thifi  issue*: 

In  di*x:ussions  of  the  ethical  implications  of  different  modes  of 
achieving  personality  chanqes,  commentators  often  mistakenly 
ascrilx?  a  negative  morality  to  behavioral  approachos,  as  though 
this  WL^ro  inherent  in  the  procedures.    Social-learning  theory  is 
not  a  systGfu'^f  ethics;  it  is  a  system  of  scientific  principles 
that  can  be  successfully  applied  to  the  attainmc^nt  of  any  moral 
outcome.    In  actuality,  because  of  their  relative  efficacy, 
behavioral  approaches  hold  much  greater 'promise  than  traditional 
methods  for  the  advancement  of  self-determination  and  the 
fulfillment  of  hunrtan  capabilities.    If  applied  toward  the  proper 
ends,  social-learning  methods  can  ouite  effectively  suoport  a 
hunanistic  morality. 
Morality  is,  of  course,  the  issue  that  emerges  in  determining  what  are 
prooer  ends  of  any  assessinent  or  therapeutic  procedure. 

It  should  be  empliasi?x?d  that  various  ethical^^cales  of  the  professions 
will  not  provide  guidelines  for  all  issues  that  emerge  in  treatment 
assessment,  and  research.    Rven  with  the  best  e:..ical^  codes,  individuals 
will  need  to  c3Tibrace  basic  moral  principles  for  human  conduct.  But,^sic 
moral  principles  vary  within  and  across  cultures,  and  may  even  be'more  , 
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subjective  (and  less  easily  identifiable)  sources  of  guidance  and 
regulation.    Presunably,  moral  thinking  is  the  basis  for  .^development  of 

i 

codes  of  professional  conduct,  but  this  has  not  al^v^ys  been  specified. 

Issues  in  Assessment 
Virtually  all  special  educations/programs  established  for  children  r 
.experiencing  learning  and  behavior  disorders  involve  some  type  of  formal  or 
informal  'testing  and  assessment.    Although  sane  individuals. have  made  a 
distinction  between  testing  and  assessment  fe.g.v  Mahoney  &  Vfard,  1976; 
Salvia  &  Ysseldyke,  1981)  ,  vje  will  be  using  the  terms  interchangeably  (see 
discussion  in  Chapter  1  )  .    However  ,  as  we  will  ^e,  an  important  issue 
thcTt  emerges  in  psychological  and  ed.ucational  assessment  when  legal  and 
ethical  issues  are  embraced  relates  to  the  psychometric  credibility \of  the 
procedures  as  well  as  the  use  for  which  it  is  put  in  making  decisions  about 
chiW  intervention.    Ebr  example,  in  1972  it  was  estimated  that>^more Jthaa — 
250  million  tests  in  the  area  of  academic  skills,  perceptual  and  motor  • 
functioning,  social-emotional  functioning,  anc3  vocationally  oriented  skills 
were  administ;ered  in  education  (Hohman  &  Docter,  1972).  In  1975  when 
Congress  passed  Public  Law  94-142  (20  U.S.C.,  1401-1461)  ,  The  Education  for 
all  Handicapped  Children  Act,  large  nimbers  of  normal  and  handicapped 
children  experienced  assessment  from  a  variety  of  school  based 
professionals  (e.g.,  school  psychologists,  speech  therapists,  counselors,  . 
special  educators) .   '  . 

The  rapid'  proliferation  of  assessment  has  raised  consciousness  over 
the  inTpir^ations  this  activity  has  for  individuals  participating  in  it. 
Increasingly,  individuals  outside  the  professional  community  have  become 
quite  critical  of  testing  practices  and  procedures.    Three  books  were  quite 
instrumental  in- alerting  the  public  to  sources  of  controversy  and  problems 
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in  testing:  71rte_tyrannv^J^-.;t^^     Banesh  Hoffman  (1962;  The  .brain 
watchers  by  Martin  Gross  (19^2);  and  •they^s^^J^s^by  Hillel  Black 
(1963).    concerns  over  testing  have  also  grown  in  the  fields  of  psychology 
,  atxi  education  especially  as  they  relate  to  assessment  and  treatment  of 
minority  group  children  (see  Chapter  1)  .    Organizations.have  established 
formal^guidelines  for  assesaiient  and  even  statements  of  policy  on  proper 
use  of  tests  (see  Chapter  9) .    A  more  recent  involvanent  has  come  from  the  ^ 
courts-^v^o  hav^  been  asked  to  decide  on  the  utility  of  testing  practices  in 
making  decisions  about  services  for  children  and  youths..  In  this  section 
^  of  the  chapte;;,  we  deal  with  how  these  issues  influence  assessment 
activities  for  both  minority  and  norminority  graup  children.  ^  • 


Criticisms  of  Assessment 


AssessTient  typically  involves  some  tyoe  of  relationship  between  the 
assessor  and  the  assessee.    This  relationship  ma'y  not  always  be  known  to 
the  assessee,  especially  with  regard  to  the  potential  conseouences  the. 
information  gathered  during  assesa^ent  may  have. ^  Numerous  criticisms  have 
been  advanced  against  assessment  practices  and  instrments^.    Mong  the  more 
comrion  include  the  allegation  that  assessment  represents  an  invasion  of 
privacy,  assessment  may  create  an  unfavorable  atanosphere,  assessment 
results  in  labels,  and  assessment  may  be  discrimiatory  against  certain 

groups.  . .  ^ 

Invasion  of  Privacy.    T^e  right  of  privacy  is  enb«3ded  in  the  U.  S. 
.Cons\itution,  but  roiiains  remarkably  ambigious  in  sane  areas  of  practice, 
•mere  appears  to  be  two  somewhat  overlapping  aspects  that  energe  in  the 
privacy  concept  (Bersoff ,  1978)'.    First  is  the  right:  not  to  suffer 
goverment  prohibition  as  a  result  of  engaging  in  private  activity.  Tbe 
.    second 'is  the  right  to  be  free  frcm^overonent  gaj:hering,  ?torage,  and 

) 
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dissanination  of  private  information  (see  Dorsen,  Bender,  &  hteiijorne, 
1976) .    Extending  this  concept  beyond  the  governmental  context,  Reubhausen 
and  Brim  (1965)  offer  the  following  definition: 

"The  essence  of  privacy  is.."  the  freedan  of  the  individual  to 
->»     pick  and  choose  for  himself  the^ime  and  circumstances  under 

which,  and  most  importantly,  the  extent  to  which,  his' attitudes , 
beliefs')  behavior  and  opinions  are  to  be  shared  with  or  withheld  ^ 
from  others.    The  right  of  privacy  is,  therefore,  a  positive-  claim  ' 
to  a  status  of  personal  dignity,  a  claim  for  freedom  . . .-  of  a  v^ry 
special  kind^  (pp.  1189-1190) .  '       .  . 

The  essence  of  the  issue  in  any  type  of  assesgtient  is  the  right  of  the 
person  to  determine  what  type  of  information  of  a  personal  nature  will  be 
shared  v/ith. others.    Consider  the  following  situation  that  might  be  , 
involved  in  assessing  a  minority  chi4d  who  is  extremely  socially  withdrawn. 
A  psychologist,  believing  that  data- are  needed  on  th,e  peer  perspectives  on 
the  child  might  administer  a  socicmetric  scale  to  the  child'6  clasatiates 
during  regular  class  sessions.'  Although  cooperation  of  the  school  has  been 
obtained,  voluntary  consent  of -the  students  and  parents  have  not  Ijeen 
sought.'^ 

This  situation  involves  consideration  of  several  issues  that  are 
problematic.    First  of  all,  informed  consent  was  not  obtained  of  the 
student  (see  later' discussion  of  the  informed  consent  principle).  Second, 
^    the^information  obtained  on  the  test  scale  was  likely  of  a  highly  personal 
nature  and  many  stirfents  may  have  considered  it  an  invasion  of  privacy  to 
provide  it.    Third,  it  was  not  specified  how  the  information  was  to  t3& 
usedi    Ihat  is,  were  the  data  to  be  disseminated  to  any  per'sonal  with 
identifying/information  arvJ^if  so,  to  whom?    Consider  further  that  the 
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psychologist  might  be  asked  by  the  scJx.ol  officials  to  intervene  with  other 
students  who  are  reported  to  be  withSra.^.  or  friendless.    Ihe  issue  of  the 
psychologist  offering  his/her  services  to  an  extrenely  withdrav^  child  may 
energe.    Therefore,  the  individualls  personal  privacy  may  ba  involved." 

invasion  of  privacy  is  then  a  broad  concept  that  involves  several 
issues,  I-nclu3ing  :informed  consent,  confidentiality,  and.even  psychologicar 
stress,    invasion  of  privacy  is  more  carplex  when  children  are  involved 
than  when  adults  are  involved.    Privacy  rights  guaranteed  by  the 
constitution  are  granted"  to  adults  and  generally,  not  to  children.  Although 
the  courts  have  granted  sa.^  pr  ivacy  r  ights ,  to  children  (e.g..  Tinker^- 
;^^v.^n..  f^chool  District,  .1CIS9) ,  there  may  be  considerable"  coripr anise 
v^ere  the  child's  interest  is  at  stake  (Bersoff 1978)  . 

privacy  issues  are  also  raised,  in  the  use  of  unobtrusive"  measures  in- 
assessment  of  learnir^  and  behavior  problems.    Usually,  unobtrusive^ 
measures  are  taken  without  the  client's  av^reness  so  as  to  avoid 
sensitizing'  them  (see  discussion  in  Chapter  7  ) .    ^et,  obtaining  such  . 
measures  may  violate  the  requirenents  of  informed  consent  and  may  be 
perceived  as  an  invasion  of  privacy.    Assessors  might  consider  several 
■  alternatives  in  this  area.  .  First  of. all,  for  some  unobtrusive  assessment^, 
the  issue  of  consent  may  not  anerge.  'Many  archival  records  would  be 
publically  available  and  could  be  used/without  any  personal -identification. 
For  example,  the  grades  a  child  receives  as  part  of  his^er  r^ular  . 
evaluation  or  the  nonber  of  times  the  child  is  sent  to  the  principal's 
office  represent  seme  examples  that  represent  a  minimal  invasion  of 
Vivacy.^  AS  a  second  possibility,  the  child's  parents  could  provide 
consent  for  several  different  types  of  asses^ent  opportunities,  only  seme 
of  which  would  be  used  (Kazdin,  1979).    A  third  option  sometimes  advanced 
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Is  to  go  ahead  and  conduct  the  unobtrusive  assessments  and  inform  the 
child/parent  subseouently  that  te/she  has  the  option  to  h^ve  such 

\ 

inforniation  ronain  confidential.    Yet,^  this  option  may  not  tbe  acceptable 
given  the  possibility  that  assessment  was  initially  objectable  (Kazdin, 
1979).    Moreover,  once  the  information 'is  obtained,  ¥t  could  conceivably 
cause  a  t&reat  to  orivacv,  esoecially  in  cases  where  it  has  an  important* 
beariryg  on  the  decision  making^process  or  where  legal  issues  are  involved 
{e.g.,  discrimination,  access  to  special  services).^ 

The  decision  to  require  students  to  take  tests  or  examinations 
(especially  those  involving  personality  and/or  attitudes)  could  be  done 
within  the  context  of  a  panel  that  considers  some  of  the  impligations  ^ 
involved.    Specifically,  the  following  might  be  considered: 

1.  The  ability  of  the  test  to  measure  precisely  those  objectives 
the  school  or  district  intends  to ^measure. 

2.  The  possibility  of  embarrassing  or- embtionall7  damaging 
children  \*o  take  the  test. 

3.  Ihe  extent,  to  which  ccmm^^^ mores  and  values  are  likely  to 
be  affected  by  the  test.  '  "  * 

4.  The  potential  benefits  of  testing. 

c 

5.  The  possibility  of  .using  .volunteers  instead  of  captive 
audiences. 

6.  '   The  s tens  that  will  be  talcen  to  ensure  confidentiality  of 

results,  aind  ^  ■  ^ 

■     /•  ■ 
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7.    Ihe  possibility  of  obtaining  data  without  testing  (using 

census  reports  or  public  docimentSr  for  example)   (Sax^  1974, 
pp.  26-27).  y  I  * 

In  addition  to  the  above,  we  would  add  the  need  to  determine  tiie 
potential  biased  natur^  of  the  test  when  used  with  children  fran  diverse 
cultural  and  language  backgroun^ds.  .  . 

It  ,is  recognized  thar  the  invasion  of  privacy  issUe  extends  v^ll  . 
beyond  testing  (T^erican  Psychological  Association,  1981)  .  Many^'routine 
*  activities  in  our  society^are  aimed  at  gathering  information  that  is 
ccPFmonly  regarded  as,  personal  (e.g. /  opinion  polls,  or  credit  card 
applications)  .    Material  provided  by  -ftudents  in  .'school  as  part  of  regular 
class  activities  might  also  be  regar:3ed  as  such.    Yet,  it  is  the 
responsibility  of  the  professional  to  follow  the  legale  directives  and 
ethical  guidelines  advanced  that  have  a  bearing  on  such,  issfbes  (see  later 
sections  in  this  chapter)  .  ,  '  ^ 

Tests  Create  ..an.  Unfavorable  Atmosphere.    Another  criticism  of  tests 
has  been  that  they  may  create  an  unfavorable  atmosphere  for  the  client  or 
student  involved  in  taking  than,    indeed,  the  requir^ent 't^t  children  and 
youth  participate  in  formal  test-taking  has  created  a  whole  literature  on 
treatment  '-ethods  to  reduce  test  aa;:iety.    "npst  anxiety  has  been,  a  source  ^ 
of  some  discussion  in  the  professional  literature  (e.g.,  Johnson,  1979; 
Morris  &  Kratoch^ll,  1983;  Phillips, 'l978;  Sarason,  1980;  Tryon,  1980)  and 
refers  to  "...  an  unpleasant  feeling  or  emotional  state  that  has 
physiological  and  behavioral  concomitants .and  that  is  experienced  in  formal 
testing  or  other  evaluative  situations"  (Dusek,  1980,  p.  88).    It  is 
; possible  that  test  anxiety  is  associated  with  cognitive  and  attentional 
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•processes  that  interfere  with  task  performance,  although  this  does  not 
always  occur.  >  _. 

^ministration  of  certain  individual  standardized  tests  (e.g,,^  10)  has 
also  been  criticized  for  inducing  an  unfavorable  atn)osphere  that  may  resuLt 
ir)  discriminatory  practices  to  certain  ethnic  ^  racial  groups  (see 
Kratochwill^  Alper  &  Cancelli,  1980)^.    It  is  sometimes  assumed  that  if  a 
child  does  not  oerfobn  v^ll  on  individual  tests,  the  results  could  be  an 

inaccurate  reflection  of  classroom  performance  (Reschly,  1979) •  Factors 

«>  • 

accounting  for  poor  performance  on  tests  may  be  related  to  motivational 
factors  or  situational  anxiety  generated  by  the  test  ^^g^  test  environmnet 
(e.g.,  an  unfamiliar  situation^  examiner) .    For  example ^  Pier sel  et 
al.  (1977)  found  that  a  pretest  vicarious  situation  in  v/hich  minority  group 
children  viewed  a  seven-minute  videotape  of  a  white  examiner, testing- a 

minority  child  under  positive  conditions  (e.g.,  praise)  resulted  in  only 

■p  '  ■    '  ' 

.14.3%  of  the  v^SC-Revised  (WISC-R)  scores  being  1.  SD  below  the  mean, 

whereas  42.8%  and  52.4%  of  the  scores  <^re  1  SD  below  the  mean  under  a  • 

standard  administration  and  feedback  conditions,  respectively.  Although 

the  findings  were  discussed  within  the  context  of  motivational  factors, 

specific  anxiety  components  could  also  be  invoked  to  explain  the 

performance  differences.    Children  viewing  a  pretest  vicarious  interaction. 

may  experience  reduced  levels  of  anxiety  that  then  have  a  positive 

influence  on  performance. 

Of  course,  del  iterate  attempts  to  create  anxiety  or  fear  aifiong 

children  during  testing  .si'tuat ions  may  prove  unethical.    Yet,  the 

^dmihistratibn  of  tests  to  children  is  a  conrnopplace' event  in  the  school 

and  community.    A  major  issue  here  is  that  efforts  should  be  made  to  reduce 

'the  spvere  negative  influences  that  accompany,  testing;    A  rather  large  body 
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of  literature  suggests  some  useful  intervention  procedures  for  this  pr'oblefn 

(see,  for  cxaT^ple,  Tyron,  19B0) .    In  the  area  of  assessing  children'^s  fears 

ii  •'*...* 
and  phobias,  the  psychologist  must  consider  that  administering  Sv..e 

particular  assessment  device  could  have  a  negative  r'-^^act  on  the  client.' 

T'aO  considerations,  emerge  from  this,  type  of  assessment.    First,  attempts 

should  be  made  to  reduce  any  negative  emotional  aspects  that  surround  the 

assessment  procedure.    This  would  be  in  accord  with  sound  ethical  practice 

where  stress  should  be  minimized.    Second,  the  assessor  must  consider  that 

any  anxiety  created  by  the  testing  itself- may  lead  to  inaccurate  results 

and  hence  could  possibly  lead  to  misguided  intervention  procedures 

Assessment  ^Results  .inXabels.    A  major  objection  to  assessment  is  that 

it  frequently  results  in  labeling  the  child  in  a  w^iy  that  may  prove 

.destructive.  .Concern  over  the  labeling  process  has  been  extensively 

diiicussed  in  the  professional  literature  (e.g. ,  Gordon,  1975;  Guskan,  1974; 

Hobbs,  1975a,  1975b,  1975c;  MacMillan  &  Ifeyers,  1979;  MacMillan,  Johes,^  & 

Aloiu,  1974;  Mash  &  'mrdal,  19B1;  Merrrei,  1973,  1975;  Ross,  1980;' Rowitz, 

1974) .    Yet,  labeling  children  may  have  a  nuTiber  of  positive  features 

(Rains,  Kitsure,  Duster,  &  Friedsbn,  1975).    First  of-all,  labels  may  help 

sunnarize  and  order  observations  which  in  turn  help  professionals 

connmunicate.    F\Dr  example,  professionals  with  diverse  backgrounds  can  talk. 

about  "organic  mental  disorders"  (DSM-III)  and  iiave  seme  general 

understanding  of  what  is  involved  in  the  problem.    Second,  labels  may  in 

some  cases  facilitate  treatment  strategies  for  a  oarticular  disorder. 

Civen  that^  learning  disorder  can  be  reliability  diagnosed,  several  of  the 

available  treatment  approaches  described  in  the  professional  literature 

could  be  anployed.  -Tliird,  labels  may  serve  as  an  organizer  for  scientific 

research  (e.g.,  epidemiological,  etiological,  and  treatment)  on  ra 
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particiXlar  disorder.    Etourt^i,  labels ^ay  serve  as  a  reference  .point  for 
tolerance  or ' acceptability  of  childhood  behavior  (Algozzine,  rtercer,  & 

Countermine,  1977).  ' 

More  often,  negative  features  of  the  labeling  process  have  been 
raised.    Concern  qyer  labeling  became  especially  acute  during  the  late 
1960s  and  early  1970s  as  legislation  emerged  dealing^' with  this  issue.  iVie 
perceived  negative  by-products  of  labeling  have  been  of  particular  interest 
when  discussing  the  disproportional  representation  ^  minority  group 
children  in  educaational  diagnostic  categories.  In  ^ition  to  general 
concerns  over  diagnostic  classification  systems,  increased  attention  has 
been  focused  on  how  children  are  labeled       the  schools.    Such,  growing 
attention  has  been  due  in  part  to  (a)  the  need  to  classify  students  for  ■ 
certain  purposes  (special  services)  and  to  assign  names  to  these 
classifications,  (b)  the  non  average  characteristics  of  certSin  groups  ^ 
(emotionally  disturbed) ,  and  (c)  the  propensity  to  associate  children  with 
the  riane  of  the  group  they  have  been  assigned  to  (MacMillan  &  Meyers, 
1979)  . 

\ben  a  decision  is  made  to  assess  a  child  experiencing  a  learning  or 
behavior  disorder,  several' potential  issues  qmerge  in  the  process.  First 
of  all,  a  possible  concern  of  clients  and  their  caretakers  relates -to  the 
"possible  label,  diagnosis,  pr  classification  that  may  ensue.    In  such  cases 
it  is  not  so  much  the  lab-ling  process  itself  as  it  is  the  potentially 
negative  influences  associated  with  the  label  (Pruch ,  Engel ,  &  ftors^,,  1975; 
anith,  1981)  .    In  our  culture  the  use  of  some  formal  label  may  «equently 
be, associated  with  the  assigar.ent  of  a  .negative  value  such  as  "sick", 
"disturbed",  "mental",  and  so.  forth.    iViese  values  may  further  causfe 
enotional  suffering  on  the  part  of  the  child  and/or  parents.    Second,  _ 
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beyorri  the  specific  concerns  with  labeling  per  se,  there  may  be  long  term 
negative  consequence's  Associated  with  the  labeling  process  such  as  lack  of 
a  regi^ar  education,  denial  of  employment,  among  others.    Third,  the  use  of 
formal  diagnostic  classification  .systems  (e.g.,  DSM-III,.  PL  94-142)  may 
lead  to  a  violation  of  himan  rights  in^that  the  labels  employed  maj^  be 
shared  with  others  without  the  informed  consent  of  the  client^^JStiith, 
1981)  •    Srtiith  (1981)  notes  that  clients  are  not  usual lyf'infQrmed' that  when 
they  sign;  confidential  information  release  forms  they  are  often  giving 
blind  consent  tD  release  specific  diagnostic  classification  d<3ta.  Of 
course,  this  c^uld  occur  for  both  the.  parent  and  child.    9nith  has  argued^ 
that  the  APA  should  amend  the  ethical  guidelines  for  provision  of 
psychological  services  to  take  into  account  this  apsect  of  practice. 

4     Assessment,  of  child^n  experiencing  learning  and  OTiotional  problem^  in 
school  settings  may  lead  to  special  class  placement  as  well  as  labeling 
(e.g-"Gmotionaily  disturbed") .  •  Questions  can  be  raised,  over  the  efficacy 
of  this  process  and  some  empirical'  work  has  appeared  in  the  field  of 

s 

psychology  and  education  on  this  issue.    Tiie  primary  concern  here  is 
whether  or  not  labels  such  as  "emotionally  disturbed"  have  an  adverse 
effect  on  the  child  i^  the  sense  that  labels  may  bias -the  professional 
toward  seeing  more  pathology  or  deviance  than  otherwise  vrould  have  been 
perceived  withput  the  label.    While  sane  writers  have  noted  the  potential  . 
negative  effects  of  labels,  (e.g.,  Catterall,  1972;  Reynolds  &  Barlow, 
1972),  there  is  no  uniform  evidence  for  this  negative  influence.    As  Mash 
and  Terdal  (1981)  obse'rved,  some  research  has  shown  that  a  particular  child 
behavior,  v;hen  believed  to  be  exhibited  by  a  "disturbec3  child",  may  produce 
different  reactions  than  when  believed  to  be  exhibited  by  a.  'nondisturbed 
or  normal  child"  (e.g.  Stevens-Long,  1073).    Also,  other  research  has  shown 
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that  observers  may  tend  to/overestimate  negative  behaviors  in  a  group  of 

children  labeled  behavio^lly  disturbed  v;hereas.  they  underestimate  negative 

/  •  .  '  ^ 

/ 

behavior  in  a  group  labeled  nbrmal  (Yates,  Klein,  &  Haven,  1976).  Several 
reviews  of  the  literature  have  not  found  support  for-  the  adverse  effect  of 
labeling  (e.g.,  MacMillan  &  Nfeyers ,  1979;  MacMillan  et  al.,  1974;  Guskin, 
Bartel.,  &  NbcMillanf,  1975;  Kratochwill  et  al./  1980;  Reschly,  1979).  Some 
discrepancies  in  this  area  may  be  an  artifact'df  the  methodologies  employed 
(R^schly,  1979).    As  an  example,  in  studies  where  college  students  or    ^  - 
teac^ery^re  provided  only  t;he  label  and/or  not  of  onl^^  brief  exposure  to 
the  labeltvj  child,  a  relatively  large  expectancy  effect  is  ^ound  (Ysseldyke 
&  Foster,  1978).    Yet,  in  studies  enploying  the  sanne  basic  methodology  b\at 
a  mp're  lengthy  exposure  to  the  labeled  child,  the  expectancy  effect  is 
either  diminished  ovet'time  or  is  not  found  (feschly  &  Lamprecht,  in  press; 
Yoshida  &  Nteyers,  1975) .    Moreover,  the  fact  that  many  studies,  havte  not 
been  carried  out  in  the  clinical  setting  threatens  the  external  validity  of 
this  empiri'cal  vx)rk  (kratochwill,  et  al.r  198^1).  ,  ^ 

Concerns  have  also  been  raised  over  a  possible  "self-fulfilling 
prop^iecy"  effect.    Tlie  "issue  here  would  be  whether  or  not  a  child  labeled 
as  disordered  will  be  perceived  in  a  negative  manner,  thereby  contributing 
further  to  the  problem.    Research  in  this  area  has  not  clarifi^ed  the  issue. 
For  example,  Rosenthal  and  Jacobson^s  (1968)  work  has  been  criticizedi  for 
metlx>dological  inadequacies  (e.g.,,  Rla'shoff  &  Snow,  1971;  Humphreys  & 
Stubbs;  1977;  MacMillan  et  al.,  1974;  Snow,  1969;  Thorndyke,  1968)  and  at 
least  some  research  with  the  eaiotionally  disturbed  label  (e.g.,  Ubster,^ 
Ysseldyke,  &  Reese,  1975)  has  not  supported  the  self-fulfilling  prophecy 
notion.  ^  -  ^ 

conceptual  probleais  with  thojpotential  negative  impact  of  labeling 
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have  also  been  raised.    Hie  label  itself  may  not  be  the  sole  cause  of 

negative  experiences  that  are  pfcesuned  to  be  associated  vdth  .it.  I^ius 

^  ■  ■  • 

prelabelirrj  behaviors  exhib^  :ed  by  the  child  that  led  to  labeling,  as  wsU 

as  consequences  associated  with  the  label  may  account  for  the  negative 
influences.  .  As  I^sh  and  Tterdal  (1981)  note,  the.informal  labeling  process 
and  interoretations  of  formal  labels  by  various  individuals  (parents, 
teachers,  etc.)  may  have  the  greatest  impact.    Once  a  child  has  been 
labeled  anotionally  disturbed  (e.g.>  emotionally  handicapped)   it  \s 
conceptually  impossible  to  attribute  the  negative  influences  to  the  formal 
■  labeling  set  (I^cMil Ian  &  Meyer^,  1979).    Thus,  the  labeling  process 
appears  to  be  conceptually  complex  and  any  variance  attributed  to. the 
labeling  experience  must  1:ake  into  account  several  factors  noted  by 
MacMillan  et  al.  (1974):  .  '  ^ 

1.  Prelabeling  experiences. 

2.  -rtie  effect  of  .the  label  versus  the  pereceptions  of  the 
services  received  by  the  individual  once  labeled. 

3.  The  effects'  of  formal  versus  informal  labels  on  the  measure 
of  interest  as  vjell  as  the  agency  and'  individuals  who  append^ 
the  label. 

4.  C&ses  where  children  carry  multiple  label^  simultaneously, 
such  as  the  delinquent-FMR,  or  disadvantaged-LD. 

5.  The' resoonse -of  the  child  and  the  fcimily  to  the  appended 
label;  most  noticeably  whether  they  deny  or  accept  its 

■   validity  or  whether  the  salient  attrilj|(:e  is  highly  valued  by 
■the  child's  subcultural  group  (r-lacMillan  &  Meyers,  1979,  pp. 
179-180).  /  ' 

It  appears  clear  that  future  research  will  be  important  in  sortijigVout 
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the  conceptual  and  methodological  issues  in  this  .area./    It-does  appear  that 
assesanent  will  continue  to  result  in  diagnosis,  class^cation,  and 
labeliirj.         /^henbach  (1974)  has  noted,  "Ttie  basic  question  Vs  not 
whether  to  classify,  but  how  to  classify"  (p.  543).    Ntany  of  the  issues  in 
the  literature  have  been  advanced  toward  traditional  assessment  and 
diagnostic  systems.    Although  behavioral  assessment  sometimes  leads  to 
formal  diagnosis  and  labeling  (see  Chapter  3),  it  is^unclear  what  influ^ce 
the  emerging'H^^havioral  assessment/classification  schemes  will  have  on 
children.    Some  evidence  suggests  that  behavior  therapists  may  be  less 
easily  b'iased  by  labels  (e.g.,  Langer  &  74>elson,  1974),  but  much  work 
renains  to  be  done  on  this  issue.    \^en  research. is  conducted,  it  will  be 
productive  to  consider  the  already  developed  conceptual  and  methodological 
issues  advanced  in  this  area. 

Assessment  Results  in  Discriminatory  Practices.    A  central  criticism 
of  assessment  is  that  ,it  leads  to  practices  that  are  believed  to  be  ^ 
discriminatory  against  certain  individuals  or  groups,  usually  minority 
racial  and  ethnic  groups  (Alley ^  Poster,  197R;  Flaugher,.  1978;  Kratochwill 
et  al.,  1900;  Oakland  fi  Matuszek,  1977;  Reschly,  1979;  Sattler,  1974). 
Minority  children  frequently  face  the  problem  of  misclassif ication  in 
sclTcx)!  systems  with  black  children  being  three  times  as  likely  as  white 
children  to  be  placed  in  classes  for  the  educable  mentally  retarded  (see 
SCF  (19R01;  and  the  report  by  the  Education  Mvocates  Ctoalition  on  Federal 
Compliance  Activities  to  the  RSucation  for  All  Handicapped  Children  Act 
((Public  raw  94-142,  April  16,.  19801).  "  The  concept  of  nondiscrijTiinatpry  or 
non-biased  assessment  has  been  a  central  theme  in  recent  federal 
legislative  and  judicial  actions  that  have  provided  guidelines  for 
assessment  practices. 
ERIC  _  ■.../^^^^^^^^^^^^^^ 
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The  concept  of  nondiscriminatory  assessment  has  invoked  two  primary 
legal,  ethical,  and  moral  issues,  namely  assessment  of  minorities  and  the 
use  of  certain  traditional  assessment  devices  and  procedures  (e.g.,  TO 
test)  in  this  testing  process.    As  noted  in  Chapter  .1,  traditional 
assessment  procedures  have/been  the  primary  focal  point  of  criticism  with 
sc?voral  alternatives  examined.    ;^ong.  the  more  common  reccmmendations  in 
the  acea  of  child  assessment  that  vje  discussed  in  Chapter  7  has  been  the 
call  for  a  moratorium  on  conventional  tests,  elimination  of  speical  class 
placement,  language  translation  in  testing,  use  of  minority  group 
examiners,  modification  in  test  procedures  (e.g.^  providing  reinforcement), 
and  creation  of  so  called  "culture  fair"  tests.    Although  it. is  beyond  the 
scope  of  this  chapter  to  jgeview  e-      of  these  proposed  alternatives  within 
an  ethical^legal  context,  it  shou^  ,     •  noted  that  each  provides  numerous 
corfbeptual  arx3  methodologcial  problems. 

Perhaps  the  major  limitation  in  work  in  the  field  of  nondiscriminatory 
assessment  has  been  the  conceptualization  of  discrimination  within  the 
context  of  minorities  and  traditional  testing.    As  an  alternative, 
assessment  procedures  should  be  evaluated  on  dimensions  of  discrimination 
v;ithin  the  context  of  how  they  influence  children,  regardless  of  race  or 
cultural  background  (Kratochwill  et  al  .  ,  1980;  Reschly,  1979).    Within  this 
'proposal  is  the  thesis  that  assessment  that  results  iji  e<jually  effective 
interventions  for  both  minority  and  nonminority  group  children  has  met^the. 
spirit  of  being  nondiscriminatory.    ^*en  assessment  practices  result  in  ^ 
differentially  ineffective  services  across  groups,  the  possibility  of 
discriminatory  practices  must  be  considered. 
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Legislative  .and  .Judicial  Jnf  luepces 

Psychology  and  education  are  becoming  increasingly  regulated.    This  is 

i 

especially  true  in  the  area  of  assessment  activities  occurring  in  these 
fields.    At  one  time,  the  court?  v^re  not  involved  in  examining  the 
assessment  activities  of  psychologists  and  educators.    One  reason  for  this 
stance  was  that  the  courts  pleaded  lack  of  expert  knowledge  (Rersoff, 
1981).    But  this  is  definitely  changing  as  reflected  in  decisions  of  the 
Suprene  Court,  lower  federal  and  state  courts,  as  well  as  in  Congress  and 
in  federal  administrative  agencies.    For  example,  the  Supreme  Court  has  ^ 
been  involved  in  such  activities  as  the  influence  of  compulsory  education 
laws,  the  requironents  of  due  process  prior  to  application  of  disciplinary 
and  academic  sanctions,  and  the  allocation  og  financial  resources  to  "poor" 
schools.    The  lower  federal  and  state'courts  have  rerrfered  decisions  on 
such  areas  as  the  right  to  education  for  Handicapped  pupils,. appropriate 
ij^t-i  fixation  of  'learning  di  sabled  ch  i  Idren  r,:and^the_right  of  school5_tQ-_ 
expel  disruptive  handicapped  students,  and  assessment  of  minority  group 
chilfSren.    Congress  and  federal  administrative  agencies  have  been  active 
inasmuch  as  in  19B4,  -the  Civil  Rights  Pet  passed  by  Congress  contained 
antidiscrimination  provisions  guarding  against  discrimination  based  on 
race,  color >.  or  ,national  origin.    These  \^re  folloved  by  the  passing  of  the 
Rehabilitation  Act  pf  1973,  the  Family  Mucation  Rights  and  Privacy  Act^ 
and  the^Fiducation  for  all  handicapped  Children  Act  of  1975  (Public  Law 

Ihe  reason  that  the  courts  have  demonstrated  a  willingness  to  render 
judgments  on  assessment  issues  relates  to  the  degree -to  which  the 
constitutional  rights  of  an  individual  have  allegedly  been  violated..  ^ 
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Sf)ecif ically,  in  the  assesanent  area  the  three  constitutional  principles  of 
equal  protection,  due  process,  and  privacy^  have  been^he  focus  of 

r 

intervention  by  the  courts  (Persoff,  1981).    For  example,  in  the  case  of 
assessment,  the  right' of  eoual  protection  has  been  interpreted  (in*  pari)  as 
the  right  to  an  equal  educational  opportunity  and  has  been  used  ^      ^'  ' 
successfully  in  sonne  cases  (e.g..  Mills  .v.>  .Board  of  .Education.  of -the 
District  of  Coluf)bia.,^1972;  ,Penn.  .Association  .fo;:  ^Retarded  ,Children^y> 
CcnTnomTealth.of  Penn.r  1972).    Yoz  example,  a  severely  disturbed  child 
would  be  entitled  to  a  public -sCrhoal  edudation' despite  the  ha^i capping 
condition.    In  the  case  of  due  process,  the  fourteenth  amendment  requires 
individual  notification  in  a  fair  and  impartial  manner  where  interests  , 
protected  by  the  Constitution  are  either  restricted  or  rescinded. 
Specifically,  the  due  process  clause  applies  where  the  individual's  . 
interest  in  life,  liberty,  or  property  are  being  considered.    Ft)r  example, 
a  school  cannot  label  a  child  "apoti^nally  disturbed"  -unless  there  is  a 
-farma^l— heal^iflg-condtKrt^ — Thus,  the  potentially  negative  conseouences  of 
such  labeling  must  be  con^dered.    In  the  case  of  the  right, of  privacy,  .the 
juiliciary  has  not  defined  the  concept,  but  has  indicated  activities  within 
its  scope.    Generally,  the  concept  has  been  broadened  to  include  freedom 
from  unreasonable  intrusion  into  family  ).ife  by  individuals  providing 
mental  health  services  (Bersoff,  1981)"* 

As  a  consequence  of  legislative  and  judicial  influences  in  assessment 
practices,  several ''procedural  requirements  have  been  established.  For 
example,  the  regulations  of  PL  94-142  represent  requirements  for  the 
evaluation  of  children  who  might  be  demonstrating  learning  and  behavioral 
disorders  and  who  are  being  considered  for  spiscial  education  services  in  a 
public  school  setting',,,.  ..^.i^- ' 
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some  specific  isr-jes  that  would  create  a  procedural  concern  for  a 
psychologist  or^  other  mental  hejlth  professional  practicing  in^a  public  , 
sci«bl  setting  involve  notice /-"bonsent,  and  access  to  records  (Bersoff, 

1981;  Martin,  1979)  . 

Notice,    one  of  the  first  procedural  safeguards  that  must  be  met  is 
formal  notice.    Such  notice  cannot  be  incomprehensible  or  intimidating  and 
cannot  come  after  the  fact.    Public  Law  94-142  (45CFR  121a.  504(a)   (1)  - 
.(2)  requires  written -not ice  "a  reasonable  time  before  the  publid  agency 
proposes  to  initiate  or^  change  (or  refuses  to  initate  or  change)  and  the 
identification,  evaluation  or  placenent  of  the  child  , or  the  provision  of  a 
free  aporooriate  public  educati6n  to  the  child".  'E^r  example,  in  a  typical 
case  of  a  child  beit>g  considered  for  special  services  due  to  severe 
anxiety,  a  psychologist  must  inform  the  parents  at  each  steo  of  the 
"process,  incluSing  asses^ent.    T^e  school  v«uld  notify  the  parents  that 
there  may  be  a'problon  and  that  ^  professional  evaluation  will  be  conducted 
on  the  child,    once  the  evaluation  is  accomplished  (given  that  consent  was 
obtained) ,  the  psychologist  would  need  to  inform  the  parents  v;hat  will  be 
done  next  (e.g.,  an  interventon  program)  ,  or  that  Nothing  will  be  done. 
Notice  v«uld,  of  course,  extend  throughout" the  intervention  phases  as  well. 

Notice  is  not  always  satisfied  easily.    It  must  be  written  in  a  manner 
that  is  comprehensible  to  the  parent  or  guardian.    If  the  parent  cannot 
read  or  only  reads  a  foreign  la,x,uage,  other  means  must  be  employed  to  make 
notice  understandable.    H^e  spscific  actlion  proposed  for  PL  94-142  is: 
1.    The  proposed  action  must  be  stated. 

C  rnero  must  be  an  explanation  wtiy  the  school  proposes  the  > 
action.  : 

3.    A  description  must  be  given  of  the  alternatives  which  v^re 
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considered  before  the  proposed 'action  was  decided  on. 

4.  Tine  reasons  must  be  explained  why  the  ot^r  alternatives  were 
rejected. 

5.  Each  evaluation  procedure,  test,  record,  or  report  that  the 
agency  will  rely  on  as  a  basis  for  the  proposed  action  must 
be  described. 

>6.    A'iy  other  factors  relevant  to  the  agency's  proposed  action 
must  be  described'  [45CFR  1212.505(a)  (l)-(4)]. 

Consent.    Notice  does  not  imply  consent.    Ttechnically,  Notification 
refers  to  supplying  .information  about  impeiiding  actions  v^hereas  consent 
requires  affirmative  permission  before  actions  can  be  taken  (Bersoff , 
1981).    Within  the  context  of  PL  94-142,  consent  is  required  for  only  four 
things  (f^rtin,  1979,  p.  102):  , 

1.  The  initial  evaluation  of  the  child. 

2.  Vne  initial  placement  of  the  child. 

3".    Evaluation  before  9  "subsequent  significant  change  in 

placement".  .  ' 

4.    Before  release  of  records  to  persons  not  already  authorized 

to  see  them.  •  - 

Like  notice,  informed  consent  is  sometimes  difficult  to  define  in  any 
technical  sense.    F^r  example,  the  question  of  who  can  give  consent  remains 
contrroversial.    Although  in  the  case  of  children  one.might  expect  that  the 
.  parent  would  be  the  typical  ir>3ividual  to  render  consent,,  open  opposition 
from  the  child  may  cloud  the  issue  (Martin,  1979) .    In  EarUy  A£^..K2Iien5 
J.  L.  v.  :Parham  two  federal  courts  required  an  opportunity  for  hearing  when 
parents  consented  to  place  youths,  over  their  protests,,  in  institution^. 
Gsnerally,  it  is  recognized  that  informed  consent  contains  three  basic 
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characteristics  that  must  be  upheld  to  meet  the  spirit  of  the  concept: 
KnowlG?qGr  voluntariness^  and  capacity,    Flach  of  these  issues  is  discussed 
in  more  detail  later  in  the  chapter.    Essentially,  the  three  concepts  are 
applicable  to. informed  consent  in  assessment,  treatment,  and  research.' 

Access  to  Assessment  Records >    Psychologists  and  other  professionals 
engaged  in  assessment  .activities  with  children  experiencing  learning  an<l 
beliavior  problans  typically  (and  hopefully)  generate  a  considerable  amount 
of  data.    A  question  that  occurs  in  this  activity  is  whether  or  not  the 
data  are  accessible  under  existing  law.    In  the  past,  psychological 
assessment  data,  ttest  protocols,  and  client  responses  have  been  guarded  to 

I  ^ 

prevent  public  disclosure.    However,  much  assessment  data  are  now  available 
to  the  public,  due  to  the  Family^Education  Rights  .and -Privacy  .Act > (1975)  . 
This  act  sometimes  called  the"nuckley  /Vnendment",  has  been  incorporated ,  in 
part,  in  PL  94-142.    Any  educational  institution  receiving  federal  funds  by 
the  U.  S.  Office  of  ESucation.  must  allow  parents  access  to  the  records 
maintained  on  their  chald  ar^  the  right  to  challenge  any  information 
believed  to  be  inaccurate  or  darnaging  to  the  child.    In  school  settings  an 
educational  record  refers>to  records  thSit  are  directly  related  to  a  student 
and  are  maintained  by  the  educational  agency  or  by  a  party  acting  for  the 
agency  ^^4 5  CFR  99.3)  .  . 

A  question  that  immediately  arises  h^re  is  whether  assessment  y 
information  and  test  protocols  are  accessible  under  existing  laws.    At  this 
time,  the  issue  is  not  clear  because  no  cases  have  ruled  to  clarify  this 
point  (I^ersoff,  i981)  .    One  of  the  key  issues  is  v;hether  or  not  the 
psychologist,,  psychiatrist,  social  worker  or  other  professional  reveals 
information  to  irdividuals  in  the  course  of  providing  inpuf  on  a  case.  For 
example,  a  psychologist  might  administer  a  rating  scale  or  checklist  to  a 
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child  as  part  of  an  assessment  to  determine  if  speciel  cla3s  placement  is 

necessary*    If  responses  were  revealed  , in  a  team  meeting  to  determine  class 

placemenM  such  information  would  be  considered  part  of  the 

educcjtional/assessment  record.    Presunably,  this. would  also  be  true  in 

cases  where  a  psychologist  from  an  outside  agency  (e.g.,  mental  health 

.       '  '.       ,  ■  ■    ■  , 

clinic)  is  providing  . inrxat  in  the  case.    Although  the  issues  are  far  from 

S  •  ,  ^  ■ 

settl(2d,  analogous  cases  in  iir^ustry  (e.g.,  National  .Labor ..Relations^oard 
13 

V,  Detroit  Edison)  suggest  that  oarents  will  increasingly  be  granted  access 

to  psychological  assessiiient  material.  ,In  Lora      ..nd.^^of  .Education.  of ^the 

City  of  New  .York  (1978)  ,  the  court  noted  that  a  failur§  to  provide  parents 

with  "clinical  records"  from  which  placement  decisions  are  made  does  not 

follow  due  process,    However,  the  failure  to  carefully  define  clinical 

*  <» 
records  still  makes  the  issue  ambiguous  (Bersoff,  198iy.  ' 

/  m 

Bias  in  IQ  Testing 

In  additiojn  to  guidelines  promulgated  by  court,  legislative,  and 
.  f  .  /  "  -J  ^ 

goverrmental  agencies  regarding  issues  related  tp^procedural  concerns  such 

as  consent  ahd  access  to  records,.  recMt  court  rulings  have  specifically 


addressed  the  issue  of  bia^  in  the  tests  used  by  schools  in  the 
Classification  and  placement  of  minority  group  children  in  classes  for  the 
mentally  handicapped.  ' 

mile  not  directly  addressing  assessment  techniques,  tvo  early  cases, 
Pennsylvania  Association  .for  Retarded  Cnildren..v«Xommonwealth^of 
Pennsylvania  (1971)  and  Mills  v.  Board  .of  ..Education  .of ., the. District;  of 
Columbia  (1972),  had  a  definite  impact  on  assessment  in  the  schools.  Both 
of  these  cases  were^^primarily  concerned  with  the  rights  of  retarded 
children  to  a  free'  public  education.    Hie  decisions  in  favor  of  the 
plaintiffs  indicated  that  tests  could  not  be  used  to  exclude  children  from 
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school  as  uneducable. 

Itie  following  yeqr  in  Louisiana,  a  rul ing  in  Lebanks  v.  .Spears  (1973) 
regarding  the  exclusion  of  children  from  public  school  stated  that 
districts  could  not  exclude  children  from  public  school  who  it  decided  vere 
uneducable,  airJ  also  set  sane  general  guiudelines  for  placenent  in  classes 
for  the  mentally  retarded.    The  ruling  stated  th?t  such  placenent  can  only 
take  place  wi th ' ev idence  indicating  an  IQ  below  70"  obtained  from  an 
individually  administered  intelligence  test  and  subnormal  adaptive 
behavior.    In  addition,  thecourt  ruled  thab  neither  measure  could  be 
inapptopriately  influenced  by  the  sociocultural  background  of  the 'child. 

In  another  case  not  directly  addressing  assessment ,  Lou  . v. -Nichols 
(1974),  a  California  school  district's  was  charged  with  not  providing 
adequate  larr^uage  instruction  to  all  Chinese-speaking  students.    Tine  ruling 
of  the  court,  hov,ever,  directly  addressed  the  districts  assesanent 
practices  by  ordering  a  task  force  be  set  up  in  the  district  to^insure  that 
bilingual  and  non-English-speaking  chil^^en  vere  properly  assessed. 
The  courts  first  directly  addressed  the  charge  that  psychological  tests 
were  biased  against  black  children  in  Hcbson  :v.  Hansen  (1967).    In  this 
case  the' plantiffs. charged  that  a  tracking  system  in  the  District  of 
Golunbia  public  schools  was  discriminatory  in  that  it  led  to  an 
overrepr4sentation.  The  court  decided  in  favor  of  the  plaintiffs,  claiming 
that. the  standardise  groop  tests  emoloyed  in  decision  making  focused  on 
acadanic  achievenent.    T^e  court  concluded  that  in  order  to  track,  it  was 
necessary  to  assess  the  children's  ca^>acity  to  learn,  something  not 
assess^ed  through  the  standardized  aptitude  tests  used  by 'the  district,  fhe 
nobson  decision  ended  th^  era  of  group  ability  testing  for  classification 
purpose  and  led'  the  way  for  a  series  of  cases  that- directly  addressed;  the 
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issue  of  bias  in  psychological  testing. 

Two' pases  tliat  follo\jed  Hair  on  in  the  1970' s  brought  attention  to' the 
potential  bias  in  ityjivijually  administered  intelligence  tests  in  assessing 
children  whose  primary  language  in  other  than  Eh<;,lish.    In  Diana 
California  .State  Board  o£-Education  (1970)  the  plantiffs  charged  that  a 
disproportunate  number  of  Hispanic  children  vore  placed  in  FMR  classes  on 
the  basis  of  10  tests  that  they  clajnped  were  unfair  to  bilingual  children. 
Evidence  vas  offerai  tfhat  \:he  nine  plantiffs  10  scores  were  found  to  be  on 
the  average  15  poinb;  higher  When  Rested  by  a  bilingual  examiner.    Ttie  out 
of  court  settlonent  resul  t.ed  in  an  agreement  with  the  state  that  all  future 
testing  of' non-- Anglo- T^ericrin  children  be  conducted  (lY  in  both  English  and 
the  child's  primary  language,  (2),  with  tests  that  v^re  not  dependent  on 
unfair  verbal  questions  or  vocabulary,  (3)  by    certified  school 
psychologists,  and         with  an  assessment  battery  that  was  mul tifaceted , 
including  educational,  devrjiopmentsi  and^  adaptive  behavior  measures  as  veil 
as  intelligence  tests.    It  was  further  agreed  that  all  Mexican-American  and 
Chinese-Anericar.  children  who  were  in  EMR  classes  at  that  time  would  be 
reevaluated  under  ttie  new  guidelines  and  that  any  district  that  continued 
to  have  a  disproport:onat!e  percentage  of  bilingual  children  in. EMR  classes 
would  have  to  provide  the  State  with  an  explanation  for  the  disparity. 

The  second  case  also  addressed  bias  in  individually  administered 
intelligence  tests  when  assessing  children  whose  primary  language  is  other 
than  English.    In  this  case,  Guadalupe  .v. ■.Tempe^El^ntary-.Sc^^QQl  District , 
(1971)  ,  the  plaintiffs  vjere  Arizona  Mexican-American  and  Yaqui  Indian 
children.    A  settlement  similar  to  Diana  was  arrartged  out  of  court. 

V.hile  Diana  and  Guadalupe  focused  on  bias  in  using  individually 
administered  intelligence  tests  with  bilingual  children,  two  additional" 
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casosr  r^rry  P.  v.  Rilos,  (19172,  1974,  1979)  in  California  and  PASK  v* 
   ^ 

Uannon  in  Illinois  (19Rf5)  ,  focused  on  the  allcxjed  bias  of  individually 
admini^tored  30  tests  in  assessing  black  children.    'Hie  plaintiffs  in  both 
cases  charged  that,  as  a  result. of  the  use  of  individually  administered  10 
tests,  blacK  children  were  being  disproportionally  placed  in  FMR  classes. 
Kvidence  was  offered  in  both  cases  that  re-evaluation  of  the  plaintiffs 
with  exaiiiners  sensitive  to  the  culture  from  which  the  child  came,  produced 
10  scores  that  disallowed  identification  and  placement  in  EMR  classes.  Iln 
the  r^^rry  P.  case,  for  example,  the  plaintiffs* wsre  readministered  the  N 
identical  IQ  test  that  lend  to  their  FMR  classification,  but  only  after 
rapport 'was  established  between  the  child  and  examiner,  and  the  setting  was 
reduced  of  distractions.    In  addition,  some  items  were  rev/orded  and  the 
children's  responses- v;ere  evaluated  in  ^the  context  of  what* were  considered 
by  the  examiners  to  be  intelligent  approaches  to  solving  the  problems  posed 
in  the  items.    Despite  the  similarity  of  issues  and  complaints  (i.e., 
alleged  violation  of  both  cons^titutiona]  and  statutory  law)  Judge  Peckham 
in  the  Larry  P.  case  and  Judge  Grady  in  the^  PASR  case  reached  different 
conclusions.    Judge  Peckham  ruled  in  favor  of  the  plaintiffs  and  Judge 
Grady  rule<;^^in  favor  of  the  ^efendents. 

In  the  Larry  .P.  case  Judge  Peckham  found  that  the  Carifornia  State 
Px)ard  of  Education  violated  both  the  constitutional  rights  of  the: 
plaintiffs  under  the  "equal  protection"  clause  of  the  institution  and 
statutory  laws  embodied  in  the  Civil  Rights  Act  of  1964,  section  504  of  the 
Rehabilitation  Act,  and,  Public  Law  94-142.    With  respect  to  the  latter, 

i  which  required  the  State  to  denionstrate  the  reasonableness  of  a  system  that 
resulted  in  discriminatoiry  effects  (i.e.,  disproportionate  representation)  , 

.  Judge^ Peckham  founfJ  for  the  plaintiffs  on  .the  qrounds  that  (1)  the  tests 

o 
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were* culturally  biased  and  (2)  there  was  no  demonstrated  relationship 

between  black  children's  10  scores  and  school  grades.    After  listening  to 

the  testimony  of  expert  witnesses^  Judge  PeckKam  concluded  that  there  was 
♦ 

not  sufficient  evidence  to  support  as  legitimate  the  average  differences 
found  betv^Qn  white  and  black  children's  performances  on  10  tests,  lie 
rejected  the  agrunents  that  the  differences  were  the  result  of  either 
genetic  or  environmental  factors  and  consequently  concluded  that  the 
differences  must  then  be  a  function  of  bias  in  the  tests.    Judge  F^ckham 
also  concluded  that  there  was  inadequate  validation  of  the  test  for  use 
with  black  children  as  called  for  under  PL  94-142.    Evidence  offered  that 
showed  correlation  betvAsen  10  tests  and  standardized  tests  of  academic' 
achievOTGnt  vv^re. judged  inadequate  because  Judge  Peckham  perceived  a  l^ck 
of  difference  between  the  msasures.    Disregarding  these  findings.  Judge 
Peckham  was  left  v/ith  little  validity  evidence  to  judge  reasonable  the 
State's  use  of  the  test  in  labeling  and  placing  black  children  in  EMR 
classes.* 

I    .  Judge  I^ckhan  also  found^  that  the  State  had  violated  ^he  right  of.  the 
plaintiffs  under  the  "equal  protection  clause"  of  the  Cbnstitution.  To 
find  the  state  in  violation  of  this  clause  the  plain'jiffs  were  required  to 
show. that  it  was  the  intent  of  the  State  to  discriminate  against^  the 
plaintiffs.    Judge  Peckham  interpreted  inter^^t  to  mean  that  the  State* 
willfully  engaged  in  a  process  that  it  knew  would  result  in 
disproportionate^  representation  iri  FMR  classesr  classes  he  labeled  a 
inferior  airl  stigmatizing.    Judge,  PeckhaiTi  found  that  the  impact  of  the 
state  Department's  action  with  regard  to  using  10  tests  was^  not  only 
"foreseeable^'  but  "foreseen"  thus  allowing  for  th(^  judgment  of  intent  on 
the  part  of  the. State.  ^  '  .  * 


t 

'Ihe  rulinq  of  Jixlge  l\.»ckhnm  made  permanent  an  injunction  ordered  in 

.  . 

19ai  banniiri  tho  use  ot  standardized  intell iqencc  tents  in  identifyinq 
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.  black  x:hildren  for  FMR  classes  .    If  such  tests  are  to  be  considered  in  the 
future  the  State  will  have  to  seek  the  approval  of  the  court.  Approval 
will  l)c  grfrntcKl  on  condition  that  the  recommended  test  is  empirically 
supported  as  valid  for  placing  black  childijen,  not  racially  or  culturally 
discriminatory,  and  capable  of  being  administered  in  a  nondiscriminatory 
manner.     In  addition,  the  State  was  required  to  re-evaluate  all  black 
children  currently  in  FMR  classes  who  had  been  placed  using  standardized  IQ 
tests  civA  design  individual  educational  plans  for  all  black  students 
returning  to  regular  classrooms.    Finally,  the  State  is  required,  to  date, 
to  demonstrate  the  effectiveness  of  district  plans  to  correct  the 
proportional  imbalance  of  black  children  in  EKR  classes.         .  ^ 

Within  a  year  after  Judge  Peckham's  decision  in  Larry  .P.,  Judge  Grady 
rendered  her  decision  in  PASR.    In  finding  a  favor  of  the  defendent.  Judge 
Grady  agreed  with  Judge  Peckham  that  EMR  placement  vas  inferior  to  regular 
class  olacement,  but  he  concluded  that  there  vas  insufficient  evidence  that 
IQ  tests  were  biased.    Judge  Grady  was  unempresfeed  by  expert  testimony  and 
decided  in  an  extraordinary  move  to  judge  for  himself  the  validity  of  the 
V7ISC,  WISC-R,  and  Stanford-Binet .    Judge  Grady  read  aU  questions  and 
ansi^ers  frcxn  these  tests  to  the  court,  determined  for  .himself  that  a  total 
of  nine  iteris  were  biased,  eiqht,  items  from  the  WISC  and  WISC-R  and  one 
itOT  from  the  Stanford-Binet.    Judge  Crady  concluded  that  these  items  v^re 
insufficient  to  appreciz-bly  influence  classification  and  placetj^nt    '  ^ 
decisions,  especially  coiisidered  among  the  additional  mandated 

.  suppor'tirTg  data  for  placing  children  in  FMR  classes.    While  he  accepted  the 
argument  that  any.  one  measure  could  result  in  bias  decisions,  he  'judged  the 
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entire  ^placanonL  pvcor?i,  identificxl  in  PL  9/1-142  to  be  a  sufficient 
CailsdCu  systtw  to  pLotiiCt  aqciinst  discrimination.    After  dincoontinq  tost 
bins  as  the  reason  for  the  discrepency  between  the  average  10  scores  of 
black  and  white  children,  Judge  Grady  concluded  the  difference  was  caused 
by  socio-oconanic  factors,  indicatinq  that  poverty  vras  theculprit  for  the 
lo;><or  avoraqo  10  scores  of  black  children. 

liersoff  (in  press)  identifies  several  similarities  and  differences  in 
his  critioue  of  the  r^rry  P.  and  PASF.  cases.    Both  courts  similarly  judge 
special  education  to  programs  to  be  inferior  to  regular  education  programs. 
JLKlqo  I^ckhc-CT  in  her  decision  labelc?d  than  "dead-end",  "isolating", 
"stigmatizing",  "inferior",  "substandard" ,  and  "educational  anacronians." 
Judge  GriJdy  identified  inappropriate  •placanent  in  an  FMR  class  as  an  . 
"educational  tragedy'-'  and  "totally  harmful".    In  addition,  both  Jixlges 
2igreed  that  a  multifaceted  assessment  is  necessary  for  protjer 
classification  and  placement.    However^  Judge  Grady  saw  the  present 
process  as  effective  in  addressing  concerns  of  bias  while  Judge  peckham 
concluded  that,  despite  the  mandate  for  a  multifacited  assessment,, in 
practice,  the  IQ  score  has  a  disproportionately  large  impact  on  EMR 
d(2cision  makinq, 

Bersoff  (in  press)  identifies  the  major  disagreanent  in  the  rulings  as 
their  difference  in  perceptions  of  the  biased  nature  of  IQ  tests.  Judge 
\  Peckham  corx:ludcx3  that  10  difference  across  races  was  an  artifact  of 
testing  while  Judge  Grady  elvy3ed  that  the  differences  were  re5l.  Bersoff 
conclu-Jes  that  despite  which  decision  one  favors,  both  must  be  considered 
inJidequ^>te  given  the  basis  on  which  they  were  made.    Indeed,  Bersoff  (in 
press)  in  speaking" of  Judge  Grady's  decision,  concluded  that  "  [T]he  method 
by  which  he  reached  that  judgment  was  embarassingly  devoid  of  intellectual 
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intcjqrity"  (p.  HR) . 

JiKlqo  [\?ckhaTi's  ckxrision  that  tJ)ero  should  bo  no  clifferonce  betv^en 
the  avercKje  black  anrJ  whito  IQ  scores  Is  premature  to  say  the  least.  As 
revie^^d  in  previous  chapters,  evidence  to  date  does  not  support  this 
conclurjion.    Ihe  raiiiification  of  this  decision,  that  is  that  only  10  tests 
that  do  not  show  average  racial  differences  can'^  used  in  the  future, 
tears  the  potential  of  being  more  harmful  to  educationally  needy  black 
students  than  the  system  now  in  place.    Also,  this  decision  may  ultimately 
lead  to  tests  with  less  predictive  validity  than  those  currently  employed. 

JiKlge  Grady's  attempt  to  subjectively  determine  the  content  bias  of  10 
test  items  has  already  by  shown  in  the  literature  to  be  a  ineffective 
procedure  (see  Chapter  4) .    His  folksy  method  of  judging  cultural  bi^s 
seriously  calls  into  question  the  value  of  his  conclusions. 

l^en  we  evaluate  the  reasoning  evidenced  in  the  decisions  of  Judges 
Peckhaii  and  Grady  within  the -concepts 'of  bias  offered  in  the  present  review 
we  can  better  understand  how  such  differences  can  result.    From  pur 
analysis,  it  appears  that  Judge  Peckham  adopted  an  egaliterian  definition 
of  test  bias  by  accepting  the  assusmption  that  there  should  be  no  10 
differences  among  races.    Itiis  adoption  is  in  deference  to  more  technical 
definitions  as  found  in  our  review  of  internal  and  external  construct  bias. 
In  addition,  Jucige  Peckham  appears  to  have  vpighed  heavily  those  aspects  of 
the  case  which  suggest  his  concern  for  what  we  have  termed  outcome  bias. 
AS  mentioned  above,  JiKlge  Peckham .discounted  those  predictive  validity 
stiulies  that  use  nonsituation  specific  criterion  measures  (i.^r 
standardi!K2d  achievement  tests)  and  that  bear  more  heavily  on  establishing 
the  valildity  of  the  construct  in  favor  of  criterion  measures  that  reflect 
critical  outcomes  (i.e.,  school  grades,    nriphasis  on  school  grades  and 
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rliuionsl  rntion  thnt  10  tostn  onployvxl  in  the  future  show  otipiricnl  ovidonco 
of  lx?ir>i  nhle  to  prtt:lic:t  whotyier  or  not  a  child  can  be  nucccssCul  in  a 
rix)ulcir  cliiss  .suttinq  \^th  rc7Tiecli(i]  help  points  to  this  conclvision.  ntiis 
is  a  loss  striqont  criteria,-  however,  than  we  offer  to  give  evidence  of 
outcane  viilirlity.    I^om  our  porr7poctivor  the  outcano  validity  of  a  tost 
onploycxi  in  decision  makinq  for  intervention  purposes  should  r^uire 
evidence  of  the  potential  effectiveness  of  the  interventi^^on,  with  outcome 
bias  existirri  when  the  assessment  yielded  'interventions  with  differential 
effectiveness  across  qroups.  -Judge  Packham  has  only  required  that  one  be 
able  to  qive  evidence  of  the  potential  failure  of  an  in-class  intervention. 
By  his  definition,  a  child  could  still  wind  up  by  default  in  specicf] 
education.    Such  a  definition  also  fails  to  acknov/ledge  the  vital  link 
betv^^en  assessment  procedures  and  the  intervention  that  follows. 

Judge  Grady,  on  the  other  hand,  did  not  appear  to  be  influenced  by 
issues  of  outcome  bias.'    tether,  his  focus  appeared  to  be  on  internal 
construct  bias.    Consistent  with  the  thinking  of  those  who  initially  ^ 
addtesserl  the  issue  of  cultural  bias  in  tests  (see  Chapter  4),  Judge  Grady 
turned  to  an  exOTination  of  the  content  of  the  tests  and  his  subjective 
appraisal  of  biasiy^Ttie  items  he  ultimately  identified  as  biased  items  in 
this  manner  have  not  been  empirically  demonstrated  as  biased. 

The  final  resolution  of  these  issues  will  be  settled  in  higher  courts. 
Riles  hafr' appealed  his  case  to  the  Appleate  Cburt  and  chances  are  that  the 
Supreme  Court  will  ultimately  rule  on  the  case.    Professionals  in  the  area 
of  assessix^nt  will  hopefully  have  sufficiently  clea^ecT'up'TfVir  own 
'^understanding  of  the  issues  to  provide  adequate  direction  to  the  Court.  ^ 
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■Issues  in  Intervention 

.  }■ 

As  we  have  noted  in  the  previous  section,  various  ethical  and  legal 
issues  have  been  raised  in  the  assessment  of  children.  However,  the 
assessment  of  children  frecfuently  (uhd  hopefully)  leads  to  an  intervention 
and  this  raises  further  ethical  and  legal  concerns  of  similar  \nagnitude. 
Intervention  with  children  frequently  leads  to  outcomes  that  extend  beyond 
the  clinical  behaviors  for  which  treatment  was  focused.    A  child's  life  may 
.  be  changed  dramatically  as  a  function  of  participation  in  psychological  or 
educational  treatment.  * 

The  ethical  and  legal  issues  discussed  in 'this  section  apply  to  all 
the  therapeutic  procedures  that  would  'usually  be  Used  for  children  - 
experiencing 'learning  or  behavior  problems  in  education  settings.  Hoveyer, 
even  though  writers  from  diverse  orientations  have  provided  discussion  of 
^    ethical  and  legal  issues  (e.g.,  Koocher,  1976;  Szasz,  1965) ,  various 

writers  have  suggested  that  some  approaches  deserve  special  consideration. 
For  example,  special  concerns  have  been  raised  over  psychoanalytic  therapy 
for  failing  to  be  of  demonstrated  efficacy  in  treatment  of  neurosis  (t^lpe, 
1981)  .    mlpe  (1981)  has  been  particularly  critical  of  psyct^nalysis  to 
demonstrate  improvement,  even  after  many  years  of  treatment. 

lb  keep  patients  interminably  in  therapy  is  an  immoral  practice  a 

and  a  social  blot  on"the~l)sychol^^^ 

tainted  by  it.    Perhaps  in  years  gone  by,  one  could  have  argued 
that  there  W3S  nothing  better  to  offer  and  that  the 
still-suffering  oatient  at  least  had  the  benefit  of  support.  But 
it  is  a  moral  requirement  of  any  health  professional  to  know  art 
in  his  or  her  field  and  be  able  to  offer  patients  alternatives 
vfaen  the  methods  used  have  faiTed  (Vtolpe,  1981  r  p.  163)  • 
^  In  addition,  special  concerns  have  also  been  raised  over  behavior 
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therapy.    Some  of  these  issues  have  been  prompted  by  "false  images"  in  the 

lay  field  (e.g.,  books  such  as  Mitford's  (1973)  Kind>-apd^u3ual,punisbgpeQt:. 

Tbe^pr  i  son  -bus  i  ness  9  newspapers,  and  films  such  as  Clockwork^Orapge]  as 

well  as  in  the  professional  community  (i.e.^  inaccurate. descriptions  of 

behavior  therapy)  (Wolf)e,  1981)  .    Yet^  the  issues  extend  beyond  this 

feature,    For  example ^  Ross  (1980)  noted  that  ethical  issues  are 

particularly  critical  for  behavior  therapists  because  (1)  behavior  therapy 

i$  a  very  effective  method  for  bringing  about  behavior  change,  and  (2)  the 

 .  > 

ruJinients  of  behavior  ther^apy  are  relatively  easy  to  acquire  so  that 

individuals  other  than  well-trained  behavior  therapists  can  use  and  hence^ 

misuse  them  (p.  62)  .    Others,  such  as  Friedman  (1975)  have  also  noted  that 

behavior  therapy  poses  special  issues  and  problems  not  raised  by  other 

therapies  and  therefore  this  approach  requires  special  regiilation.  Issues 

that  have  been  raised  include  the  view  that  the  basifc  value  pranises  of 

behavior  therapy  may  be  antithetical  to  freedom  and  personal  growth  (e.g., 

Winett  &  Winkler,  1972),  the  view  that  behavior  therapies  include 

therapeutic  assumptions  ,J:hat  will  led  to  poor  therapeutic  results  (e.g., 

ArieTri,  1974)  >  and  the  view  that  behavior  therapy  {nrovides  a  special  form 

of  control  over  others  (e.g.^  Pines,  1973;  Szasz,  1975).    Ctoncerns  have 

also  been  raised  in  the  media  about  the  application  of  behavior  therapy  in 

-^~^hoals"rpn^^^5ns^  society  atTarge.  —  ~  

Despite  concerns  raised  over  behavior  therapy,  the  Anerican 
Psychological  Association  Commission  on  Behavior  McxJif ication  (Stoltz  and 
Associates^  1978)  adopted  the  following  position:  • 

The  cqrroission  takes  the  position  that  it  would  be  unwise  for 

the  American  Psychological  Association  to  enunciate  guideline? 

for  the  practice  of  behavior  modification*    The  procedures  of 

ERJC    .  .  426 


Assessment  Bias 

419 


behavior  modification  appear  to  be  no  more  or  less  s\i)ject  to 
abuse  and  no  more  or  less  in  need  of  ethical  regulation  than 
intervention  procedures  derived  from  any  other  set  of  principles 
and  called  by  other  teens  (p.  104) .  \ 
Ihe  comnission  went  on  to  stress  that  regulation  of  behavior 
modification  to  the  exclusion  of  other  therapeutic  procedures  vould 
possibly  lead  to  the  demise  of  the  practice  of  behavior  modification^  in 
those  settings  to  which  guidelines  apply.    As  Goldiamond  (1975,  1976)  has 
argued,  with  special  guidelines  for  behavior  modification  individuals  may 
be  prone  to, use  adninistratively  simpler  procedures  (i.e.,  those  with 
little  or  no  annoyance,  delay  or  cost)  that  may  be  less  effective  than 
behavioral  techniques.    Moreover,  it  was  noted  that  specific  prescriptive 
and  proscriptive  guidelines  could  curtail  develc^ments  within  tfte  field 
(iree  Agras,  1973).    Thus,  all  psycholo^iqal  interventions  were  said  to 
enbrace  the  sane  ethical  issues  that  behavior  therapy  embraces.    In  this 
regard,  a  primary  recormendation  of  this  coonission  was  that  individuals 
engaged  in  psychological  interventions  siijscribe  to  the  ethics  and 
guidelines  of  their  professions.    Vfe  certainly  concur  with  this  perspective 
and  would  note  that  individuals  e^igaged  in  psychological  interventions  for 
children  follow  the  guidelines  of  their  respective  professions.  . 

Typically,  mental  health  and  ed;jcational  professionals  belong  to 
.  more  than  o^  professional  oiganization  that  provides  guidelines.  For 
example,  a  psychologist  might  belong  fcc  APA  but  also  subscribe  to  the 
guidelines  of  an  organization  in  his/her  area  of  expertise  (e.g., 
education,  mental  retardation,  behavior  therapy) .    In  this  regard,  the 
professional  .organization  may  offer  a  rather  detailed  list  of  guidelines 
for  tberaputic  intervention.    Such  is  the  case  with  the  Association  for 


ERIC"      ; '  .•„:'.  \ 


Assessment  Bias 

420 

AJvancement  of  Behavior  Ihetapy  who  (AABT^  offered  the  ethical  issues  for 
hunan  services.  These  guidelines  are  to  be  considered  prior  to  implement- 
ing a  tehavior  therapy  program.    Ttechnically,  however,  the  guidelines  are 
not  related  specifically  to  behavior  therapy  because  each  issue  could  be 
considered  by  a  therapist  implementing  any  type  of  intervention. 

In  the  fdllowing  section  of  the  chapter  we  review  some  ethical  and 
legal  issues  that  must  be  considered  in  implementation  of  interventions  for 
children  experiencing  learning  and  behavior  problems.    Hiese  issues  include  ^ 
control  of  behavior,  agents  of  control,  informed. consent,  selection  of 
intervention,  monitoring  intervention,  and  therapist  qualifications. 

Control -Of -behavior 

«   The  intervention  procedures  used  in  educational  settings  raise  ethical 
and  sometimes  legal  issues  over  the  control  of  behavior.    Behavior  here 
refers  tothoi^hts,  feelings,  images,  as  well  as  overt  behayior.  Control 
of  behavior  refers  to  exertirei  some  kind  of  power  over  people  by 
manipulating  the  environment  to  increase  or  decrease  behaviors  (Olrich, 
1967) .    A  major  concern  here  is  that  behavior  will  be  manipulated  toward 
settle  undesir^le  ends  (Kazdin,  1980;  London,.  1969).    Individuals  disagree 
as  to  which  type  of  goals  or  ends  are  desirable  and  so  there  is  usually 
controversy  over  this  issue.    As  is  evident  from  a  review  of  interventions 
in  contonporary  journals  in  psychology  and  education,  these  extend  well 
beyond  psychological  procedures.    For  example,  technological  and 
"  biochanical  interventions  have  been  and  are  being. used  to  treat  children 
who  have  learning  and  behavior  problems. 

Issues  of  behavior  control  are  especially  a  major  concern  in 
intervention  with  children.^  tenerally^,  children  can  be  "controlled"  more  . 
easily  than  adults.    Moreover,  from  an  eaifly  age  the  child  is  totally 
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depenaent  on  others  (Hobbs^  1975) .  The  point  at  vrtiich  the  child  is  able  to 
control  his/her  own  behavior  has  been  the  source  of  much  controversy  and  is 
certainly  far  from  settied^^rWllon^  1981)  •    The  central  issues  relating  to 


ethics  and  behavior  control  vdth  children  experiencing  learning  and 

15  .  .  " 

behavior  problems  center  around  two  major  issues:      (1)  the  issx^e  of 

controlling  children  who  have  difficulty  gaining  counter  control  over  their 

own  ehviroiment  (e.g.,  the  child  is  ixnable  to  learn  in  the  regular  class) , 

(2)  the  belief  among  mental  health  professionals  that  their  intervaitions 

are^being  implemented  in  "the  best  interests"  of  a  child  %o  assist  him/her 

to  develop  satisfactory  adjustment.    The  ethical  issue  here  refers  to  the 

use  of  sonie  intervention  procedures  used  in  treatment r  whether 

E^ychological  or  psychopharmi-cological  which  may  be  essentially  ethically 

neutral;  they  have  the  potential  for  use  or  misuse.    Thus,  within  the 

context  of  any  intervention  ptogrsm,  it  may  not  be  so  much  the 

^nterv^ition,  but  rather  the  manner  in  which  the  intervorition  technology  is 

used  by  the  professional  that  raises  the  ethical  issues. 

to  the  other  hand,  there  may  be  some  treatments  that  could  be  regarded 
as  ethically  troublesome.    For  example,  serious  ethical  and  humanitarian 
issues  have  been  raised  in  Ate  use  of  implosion/flooding  treatments  with 
children  (Graziano,  1975;  Graziano,  DeGiovanni,  &  Garcia,  1979).    Such  a 
procedure  is  usually  quite  aversive  to  the  client,  and  yetlthe  child  may 
not  feel  free  to  withdraw  from  treatment.    Moreover,  as  noted  by  sane 
writers  (e.g.,  Graziano  et^  al.,  1979;  Oilman  &  Krasner,  1975) 
implenentation  of  implosion  requires  considerable  skill  so  that  the 
aversive  treatment  procedure  is  associated  witJh  the  feared  stimulus  rather 
than  tVe  therapist. 

The  intervention  procedures  used  in  educational  setting  do  not  always 
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specify  how  various  treatment  goals  might  be  attained.    A  beiiavior 
therapist  might  reconnend  a  modeling  treabnent  strategy  for  a  hyperactive 
child.  These  procedures  provide  a  therapeutic  technique  to  reach  some  gpal 
•  reduction  of  activity  level.  .  Yetr  the  goal  of  activity  reduction  is  a 
value  ju3gnaent  on  the  part  of  the  therapist  and  perhaps  for  society  at 
large.    Alsor  individuals  may  not  agree  on  societal  goals.  Neverthelessr 
as  Ka2xain  \19B0)  notes r  "A  sciOTtist  might  well  be  able  to  predict  where  a 
preselected  goal  will  leadr  make  recottnendations  to  avert  urtdesirabie 
consequences r  or  investigate  the  actual  effects  of  pursuing  certain  goals.  ' 
Yetr  the  initial  selection  of  the  godl  is  out  of  the  scientist' s  tiands.** 
(p.  311).    ThJs  is  frequently  the  case  for  psjchologisfcs  and  other  mental 
health  professionals  v*io  work  in  institutional  settings.    Etor  example t  the 
school  psychologist  would  be  expected  to  work  toward  the  goal  of  getting 
the  enotionally  distur  ::ed  child  back  into  the  regular  classroom.^  This 
control  issue  may  conflict  with  both  the  child  and  the  parents''  goal. 

The  fact  that  niany  goals  are  out  of  the  scientist/professional's. hands 
does  not  deny  the  possibility  that  they  can  reshape  or  change  the  goal. 
Some  goals  such  as  the  one  indicating' that  all  children  should  attetnd 
school  might  be  modified  in  the  individual  case.    Generally r  however,,  the 

goals  of  tlie  client  are  compatible  with  social  goals.    This  is  usually  true 

>  ,  " .  ■  ..  .  / 

in  cases  of  learning  and  behavioral  problons  where  unpleasant  snd  even 

■;■  ■  '         ^  •  '  •     ^  '     ■     '  ■  ■  ■  '  o 

aversive  consequences  are  associated  with  the  pr  obi  ran. 

'  ^       'f  .  '  "    ■  '  ■ .       '        ■  '         ■  ■    ' ' 

Although  the  many  techniques  used  to  treat  children  are  ethical  ly^ 
*         s  ^    .  ■  .  ■ 

neutral r  there  will  be  no  neutral  or  value-free  position  in  actual 

implementation  of  the  technology  (Ka  ad  in  r  1980).    IhuSr  any  practice  and 

study  of  hanan  bey^vior  ^Jfiange  dc^s  not  remain  value  fre<*  (rxasner>  1966; 

Rogers .&  Skinner,  1956).    Thus,  endorsing  the.  goal  of  the  individual. 
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socialization  agents r  the  institution^  or  society  reflects  a  definite  value 
position  (Kaaclinr  1980)  .    Many  of  the  intervention  procedures  described  in 
the  professional  literature  provide  useful  techniques  for  children 
experiencing  learning  and  behavior  problems,    T^e  issue  of  who  may  control 
these  techniques  and  establish  the  goals  for  behavior  control  wi^l  continue 
to  be  controversial,  - 

Sometimes r  aversive  techniques  are  considered  for  children 
experiencir^  learning  and  behavior  problems.    Por  example,  such  procedures 
as  punishnentr  implosionr  or  flooding ^  might  be  used.  Increeisinglyr 
techniques  involving  aversive  procedures  h^ve  come  un^ez  judicial  review. 
Typically r  the  courts  have  becane  involved         individuals  might  be 
exposed  to  ""cruel  and  unusual""  pimisitnent.  In  sane  cases  isol?>tion 
procedures  have  been  ruled  illegal  (Mew^-York-State-Association-^or-^tarded 
Children^v^ -Rockefeller) .    When  aversive  procedures  r  such  as  timeouts  are 

~^      '    '    ;    z    •  ■  ^  ■ 

employed,  clients  must  have  access  to  food,  lighting;  and  personal  hygiene 
facilities  (Hancock^v^^ery) .    Moreover  r  -  the  courts  haVe  ^Iso  ruled  that 
isolation  may  only  be  used  for  behaviors  leading  to  pJq^ical  harm  andl/or  ^^ 
destruction  of  property  (MQrales-.v^^TurmaD^^Wyatt.,v^^StickDdy) .    Even  then, 
these  procedures  must  be  monitored  by  prof^sional  staff.    Fto^e  severe 
punishment  procedures  sfuch  as  electric  shock  can  only  be  used  in  those 
unusual  circutistances  where  the  Client  is  engaging  in  severe  \  . 
self-destructive  behavior  (Wyatt^v^^Stickney) . 

Generally^  tJiere  have  not  been  court  rulings  on  many  of  the  specific 
aversive  procedures  that  might  be  used  in  the  treatment  of  children.  An 
important  consideration  in  treatment  of  learning  and  behavior  problens  is 
that  aversive  procedures  may  in  most  cases r  be  inappropriate.    Thus,  the 
practitioner  should  consider  the  range  of  possible  nbaaversive  procedures 
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that  could  be  used  in  therafcjy.  \.  ♦ 

f  * 

Eersooal  Jtigbts 

A  major  issu^-  related  to  the  control  of  behavior  is  personal  or  huroan 
rights  in  therapy  (Schwitsgebel  &  Schwitzqebel,  1980).    Most  interventions 
involve  a  definite  control  of  behavior  and  so  issues  of  personal  rights  and 
freedan  are  raised.    IhiS  is  especially  an  impor^nt  issue  with  children' 
because  therapists  typically  intervene  for  the  good  of  the  child.    But,  as 
aoss  (1980)  has  questioned,  can  the  professional  alvays  be  trasted  to 
protect  the  child's  rights  and  best  interests?   The  issue  of  the  child's 

rights  is  somewhat  different  than  that  of  an  adult  (Rdss,  1980) .    First  of 

'  '  •  t 

all,  the  child  is  lisually  brought  to  treatment  or  referred  to  treatment  by 

.  "  '  >■  .    ■      ^  ' 

an  adult.    Typically  the  child  does  not  volunteer  for  therapy.    Second,  the 

•       .  ■'        *  '  ^    '     ^         .       ■    ■     '  " 

child  usually  does  not  decide  WhaJ:  the  goals  of  treatment  should  be,  or 

what  treatn>ent  should  be,  or 'when  treatment  should  terminate.    Issues  like 

these  have  pranpted  some  individuals  to  recommend  a  "Bill  of  Rights"  >if or 

the  chiWclient  (Bass,  1974,  1980;  Koocher,  1976).    -nie  rights^  include 

four  basic  pririciples:  , 

The ^ight ^>^be -Told -tbe^Tr uth .    The  basic  prenise  of  this  principle 
is  that  the  child  should  not  be  deceived.    The  c(hil<?-  shguld  be  toldjfene 
truth  regarding  the  purpose  for  treatment  and  %*iat  it  will  involve. 

Tl^^Riqht^to-^be^TakenuSeriously.    The  child's  perspective  in  hi ^/her 
probien  and  his/her  option  on  various  issues  should  be  seriously  considered 
and  not  dismissed  because  a  child  is  speaking. 

Tbe.jUght.^to^articipate^iDJDecision.Jtoking.    The  child  should  be 
included  in  deqisions  that  are  made  regarding  his/her  treatment  program. 
Mopting  such  a  strategy  does  not  imply  that  the  child's  perspective  will\ 
deteiinine  the  final  decision.    Yet,  €he  child's  perspective  should  be 
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considered  in  with  others  involved  in  the  therapeutic  program  (e*g.r 

parent r  teachers) . 

One  method  to  protect  the  rights  of  the  child  would  be  to  develop  ?j 

contractual  arraixfement  for  services  (e»g*r  Kazdin^  1980;  Schwitzgebel(^ 


1975;  Schwitzg^l  &  Schwit2g^lr  1980;  Stuartr  1977)  •    S*ie  advantages  of 

a  contractual  arrangement  include  the  following: 

\.     Ihe  contract  spells  out  mutual  goals  and  cormittanents, 

2.  Contracts  can  be  used  in  a  variety  of  settings. 

3.  Contracts  encourage  negotiation  of  privi ledges  and 
responsibilities.. 

4.  Contracts  reduce  disagreements  over  what  is  to  take  place  in 
therapy. 

Selection  ..of  -  Inter  venti on 


.   In  intervention  with' children's  learning  and  behaAjior  problemsr  the 
professional  must  consider  the  relative  efficacy  and  the  efficiency  with 
which  the  problem        be  solved  (Wilson  &  O'Learyr  1^0)  •    Prior  to 
implementation  of  any  intervention  program  the  professional  should  consider 
several  issues  (Morris  &  Browrfr  1982) • 

Is  the  treabnent  program  consistent  with  the  available  treatment 
.  literature?  (If  it  contains  _any  nov61  intervention  approaches 
and/or  if  a  new  treatment  method  is  being  ^proposed  where  ^  . 

^    there  are  no  data  to  support  its  efficacy r  the.  therapist  may 

want  to  propose  the  treatinent  as  an. experimental  procedure). 

«  "■  ■  'I 

Is  the  program  consistent  with  the  overall  treatment  objectives 

■       ■     ,  '.     ■  ^    •  '  * 

for  the  child  and  is  it  in  the  child's  best  interests? 
Does  the  program  involve  the  least  restrictive  alternative 
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progran  for  the  child?  \  ^ 

Can  the  program  be  carried  out  easily  given  the  nunber  of  staff 

available  and  th^  level  of^  staff  training  and  competence? 
Will  tlie  child's  progress  be  monitored  using  a  specific  procedure 
and  will  the  child  ba  cA)served  closely  for  possibly  adverse 
side  effects  of  the  program? 
Have  the  staff  been  trained  to  a  criterion  level  to  ensnare  the 

provision  of  quality  treatment? 
Has  informed  consent  been  obtained  from  the  client  and/or  the 

parents/guardians? 
Each  of  these  issues  pose  special  problems  for  the  professional 
involved  in  treating  children.    Each  will  be  discussed  hs  it  relates  to 
ethical  and  legal  issues  in  the  field*^ 

Aval lable^TreatmeptJititera tore,  m  our  analysis  of  the  ethical 
factors  governing  the  selection  of  treatment  procedures  for  children, the 
choice  of  one  treatment  over  another  i^uld  be  based  on  a  careful  review  of 
the  literature  (Mclteraara,  1978)  •    ^Po  hef]^  the  therapist  decide  on  what  type 
of  treabnent  to  employ,  the  following  series  of  questions  can  be  helpful: 
1.    How  effective  is  a  given  technique  for  the  presenting 
problem? 
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2,  tt3w  costly  is  the  procediure  relative  to  other  iu^jchniques 

* 

known  to  be  of  equal  ben^fi*^?^^^ 

3,  Are  there  any  negative  side  effects  associated  with  the 
procedure? 

4.  How  durable  are  the  effects  of  *  if;  tr-»^tinent? 

5.  Does  the  treatment  have  a  high  probability  of  being 
implemented  by  the  therapist,  client,  and/or  provider?  ' 

Some  of  these  questions  are  obviously  research  questions  and  relate  to 
methodological  and  conceptual  work.    It  is  clear  that  the  professional 
should  exanine  the  researcli  literature  to  ans*«r  many  of  the  questions  that 
will  curise  in  this  area  of  ethical  concern i 

Intervention  Objectives.    A  primary  intervention  objective  is  to 
change  the  behavior  (eliminate  the  problem)  so  that  professional 
involvOTent  can  be  tt  «  Generally,  intervention  goals  should  be 

individual  and  specific  to  the  problem  of  concern.  Several  questions  can 
help  guide  the  professional  tov^ard  more  effective  intervention  objectives 
(tertin,  1975,  pp.  69-70):  ^  ^ 

1,  Does  your  program  have  a  concrete,  objectively  stated  goal? 

2.  IS  it  directly  related  to  the  reason  the  individual  was 
brought  to  your  attention? 

3,  When  it  is  achieved,  can  your  involvement  with  the  client  be 
terminated? 

4,  Will  the  change  benefit  the  individual  more  than  the 
institution? 

5.  C^n  the  goal  be  achieved? 

6.  Is  the  goal  a  positive  behavior  change  rattier  tJian  a  negative 
o  behavior  supp^^^jssion? 
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7.    Does  the  goal  involve  changing  a  behavior  that  is  actually 
constitutionally  permissible? 

A  question  that  can  be  raised  in  intervention  programs  with  children 
is  to  what  extent  can  the  child  participate  in  objective-^  and  goals.  Hiis 
question  essentially  raises  the  issue  of  the  competence  of  the  child  to  . 
make  important  decisions  bearing  on  interventions.    Sane  information 
related    to  the  child's  ability  to  consent  to  interventions  has  come  from 
Grisso  and  Vierling  (1978).    These  authors  reviewed  the  developmental 
research  literature  and  reached  the  following  conclusions: 

1.  There  may  be  no  circunstances  that  would  justify  sanctioning 
independent  consent  by  minors  under  11  ysars  of  age,  given 
the  developmental  psycholoigcal  evidence  for  their  diminished 
psychological  capacities.  * 

2.  There  appear  to  be  no  pisychplogical  groundis  for  maintaining 
the  general  assumption  that  minoirs  at  age  15  and  above  cannot 
provide  competency  consent. 

3.  Ages  11-14  appear  to  be  a  transition  period  in*  the 
developnent  of  important  cognitive  abilities  and  perceptions 
of  social  expectations,  but  there  may  be  some  circunstances 
that  would  justify  the  sanction  of  indepertclOTt  consent  by 

.  .      these  minors  for  limited  purposes,  especially  when  cdmpetence 
can  be  demonstrated  in  individual  cases  (p.  424) . 
Unfortunately,  these  conclusions  were  not  based  on  research  directly 
bearing  on  intervention  decisions  in  real  life  situations  (Melton,  1981). 
With  the  lack  of  such  a  data  base,  it  is  likely  that  the  courts  would  ^ 
accept  tl:v2  somewhat  arbitrary  age  of  majority  of  16  years  for  informed 
consent,    ftelton  (1981)  notec^  that  youngsters  are  usually  competent  to  give 
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consent  at  least  after  age  15*    Yet,  this  WDuld  d^nd  on  individual 
differences,  cognitive  abilities,  and  the  unique  circunnstances  of  the 
problem^t"^  In  any  case,  the  professional  should  detennine  the  actual 
capacity  to  give  consent  and  plan  therapeutic  g6?ls» 

Least ^Restrictive alternatives^   The  least  restrictive  alternative 
applies  to  both  consideration  of  alternatives  to  cantnittment  and 
alternatives  for  interventions  available  (Schwitzgebel  &  Schwitag^^el, 
1980)  •    Ebr  example,  in  the  wyatt  case  it  wis  noted  that  individualized 
treatinent  plans  are  necessary  for  ah  effective  program  •'and  each  plan  must 
contain  a  statetnsnt  of  the  least  restrictive  treatment  conditions  necessary 
to  achieve  the  purposes  of  cormitment.**    A  major  goal  in  providing  services 
for  children  experiencing  learning  and  behavior  problems  should  be  to 
select  an  intervention  that  is  relatively  nonintrusive  or  restrictive^ 
Providing  the  child  with  the  least  restrictive  alternative  interveition 
will  promote  the  opportunity  to  change  under  minimally  intrusive  and 
restrictive  conditions.    The  terms  "restrictiveness"  and  "intrusiveness" 
refer  to  ••methods  that  involve  a  high  degree  of  obvious  external  control, 
especially  those  based  on  aversive  control ••  (p*  289)  •    Friedman  (1975)'  has 
defined  ••restrictiveness'*  in  terms  of  •'a  lost  of  liberty",  and 
"intrusiveness  ••  in  terms  of  placing  a  perspn  at  risk,  using  force  to 
modify  the  behavior  of  a  person,  invading  someoneVs  body,  or  the  loss  of 
personal  autonaoy*     '  • 

In  addition  to  definitions  of  intrusivene^s  or  restrictive 
intervention  methods,  two  sets  of  criteria  have  been  proposed;  one  ol  these 
is  for  the  instrusive  nature  of  a  particular  intervention  (Shapiro,  1974) , 
and  the  other  has  been  developed  to  evaluate  the  intrusiveness  of. 
b(^vi6ral  and  otb^r  procedures  with  prisoners  and  psychiatric  patientiJ 
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(Speece,  1972).    Shapiro  (1974)  proposes  the  following  six  criteria: 
1*    Is  the  effect  of  the  therapy  procedure  reversible? 

2.  Does  the  effect  of  therapy  result  in  behaviors  which  are 
jixjged  to  be  maladaptive  and/or  inconsistent  with  "normal" 
functioning? 

3.  How  quickly  does  the  behavioral  change  occur  following  the 
initiation  of  the  therapeutic  procedure? 

4*    lb  v4iat  extent  can  a  person  avoid  behaving  in  the  planned 
manner?,, 

5*    V*iat  is  the  duration  of  the  resulting  behavior  change? 
Ttje  criteria  proposed  by  Speece  (1972)  also  include  cqmponents  that 
can  be  applied  to  interventions  in  educational  settings:  - 

1.    The  nature  and  intensity  of  the  collateral  behaviors  and 
other  side  effects  which  develop  as  a  result  of  the 
procedure,  as  well  as  the  duration  of  the  effect  on  the 
"targeted  behavior • 
2#    The  extent  to  which  an  uncooperative  client  can  avoid  the 
procedure,  i#e#,  exert  countercontrol  vismawtfia  the 
therapeutic  procedure;  o 
3*    The  extent  to  which  the  procedure  involves  the  introduction 
of  physical  contact  with  the  body  of  the  client.  ^ 
It  seems  ^lear  that  procedures  advocated  in  the  literature  on  treatment  of 
children  are  sometimes  intrusive  and  restrictive* 

Generally,  the  principles  associated  with  the  concept  of  least 
intrusive  or  restrictive  intervention  necessitates  that  more  intrusive 
methods  be  applied  only  after  less  intrusive  methods  have  been  demcmstrated 
to  be  ineffective.    Morris  and  Brown  (1982)  have  proposed  a  systesn  based  on 
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the  provision  of  services  to  the  mentally  retarded,  but  which  is  useful  in 
the  treatment  of  children  experiencing  other  learning  and  behavior  , 
problatis.    This  system,  described  in  -mble  8.1,  varies  along  both  the 
dimensions  of  restrictiveness/intrusiveness  and  aversiveness  (as  defihed  in 
terms  of  the,  frequency,  intensity,  duration,  and  topography  of  the  aversive 
intervention  introduced  to  decrease  the  child's  target  behavior).    In  this 
sys ton,,  professionals  should  demonstrate  that  Level  I  interventions  have 
been  ineffective  in  controlling  a  behavior  before  proceeding  to  implanent 
Level  II  treatments.    In  a  similar  manner,  prior  to  implcsnentation  of  Level 
III  procedures,  the  professional  wo.uld  have  to  demonstrate  that  level  II 
procedures  were  ineffective.    These  considerations  r  ust  also  be  employed 
within  the  context  of  other  ethical  imperatives  (e.g.,  hunan  rights, 

informed  consent) . 

Ayailable-Erof essionals-and.,Trainipg ;    An  important  issue  in  treating 
children  -involves  the.  consideration  of  who  will  carry  out  ttie  progyan  and  , 
v^ther  those  individuals  are  trained  .  (qualified)  to  do  so.    Even  though  a 
specific  procedure  might  be  available  for  use  in  treatment,  individual (s) 
qualified  in  its  delivery  must  be  available  for  either  the  direct  service 
or  supervision  of  the  individuals  who  will  carry  out  the  program.    Many  of 
the  procedures  used  with  children  might  appear  deceptively  simple,  bpt  in 
reality  are  quite  complex  when  correcUy  implemented.    For  example,  flgcas 
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Table  8*1 

Proposed  Levels  of  Rastrictiveness/Intrusiveness  and 
Avers iveness  of  Behavior  Modification  Procedures 
with  Mentally  Retarded  Persons 


Leyel ^I^Crocedures 

Reinforcement 

Shaping 

Modeling 

Token  Efconcmy  System 
Efcological/Behavioral  Engineering 
Self-control 

Reinforcement  of  Inccxnpatible  Behaviors 
Extinction 

Level -.H.^Brocedures 

Contingent  Observation 
Exclusion  Time-CXit 
Response  Cost 
Contact  Desensitization 

Level^IH^Erocedures 

Overcorrection 
Seclusion  Time-Out 
Negative  Practice 
Satiation 

rtiysical  Punishment 


Source:  Morris,  R.  J.r  &  Brown ^       K.    Legal  and  ethical  issues  in  behavior 
modification  with  mentally  retarded  persons.    In  J.  Matson  and  F^.  Andrasik 
(Eds.)  Treat3i^nt^issues,.arrf..iDrKDvations^ip.jnePtal^  New  York: 

Plenum  Publishing  Qd.,  1982.    Reproduced  by  permission.  % 
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(1973)  in  discussing  the  qualifications  of  a  well-trained  behavior 

therapist  noted  that  this  individual: 

•  ••must  have  knowlfedge  of  the  principles  underlying  behavior 
modification^  experience  in  the  application  of  such  knowledge  to 
hunan  behavior  problems^  and  experience  in  the  experimental 
analysis  of  deviant  behavior  r  both  for  research  purposes  and  as  an 
approach  to  the  on-going  evaluation  of  clinical  care.    (S)he  must 
also,  however r  demonstrate  certain  less  vell-defined 
character i St ics.r  usually  referred  to  as  general  clinical  skills 
(P*  169) • 

As  noted  by  Wilson  and  O'ljeary  (1980)  such  "clinical  skills^  are 
typically  acquired  through  formal  graduate  training •  Professional 
orgcuiizations  have,  in  some  cases,  developed  guidelines  for  the  delivery  of 
ssrvices  f  (see  Chapter  9,  e,.gtr  psychologists  would  follow  the  Speciality 
Guidu'lipes>.f or ^tbaJ:)elixrery.>of ^Services  (APA,  1981)  in  the  areas  of 
psychological  speciality!.    For  the  APA  these  include  the  guidelines  in  the 
area  of  clinical,  counseling,  industrial/organizational,  arid  school)*  In 
and  of  itself,  graduate  training  will  certainly  not  guarantee  ccmpetenoeo 
Individuals  may  also  be  certified  ot  licensed  by  state  boards*  Moreover, 
individuals  may  also  belong  to  professional  organizations  that  provide 
certain  status  or  recognition  for  competence  in  a  certain  area  (e«g., 
diplonate  status  in  APA)  • 

But,  there  are  issues  of  quality  intervention  that  extend  beyond  the 
professional's  skills •  Even  if  the  professional  is  well-qualified  to 
deliver  services,  many  programs  for  children's  problems  are  implemented  by 
paraprofessionals  and/or  tlie  child's  providers  (e.g.,  parents).    In  such 

0 

cases,  the  professional  will  be  involved  in  supervision  of  those 
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individuals  providing  tte  intervention  services.    At  Irist  two  issues  are 
important  here,  namely,  training  and  monitoring  of  the  individuals  (cf. 
Martin,  1975).    in  sane  cases,  such  as  in  institutional  progrems, 
individuals  might  be  selected  for  intervention  implementation.    In  other 
cases,  the  professional  will  need  to  ensure  that  these  individuals  are 
trained.    Foz  example,  in  some  institutions  this  training  might  take  the 
form  of  orientation,  pre-service  training,  implemented  in-service  training, 
and  planned  in-service  training  (see  ftertin,  1975,  pp.  110-11.2). 
Certainly,  these  procedures  appear  necessary  and  desirable  in  most 
institutional  settings.    Yet,  implementation  of  these  strategies  in  scxne 
schools  and  especially  in  home  settings  may  prove  especially  difficult. 
Nevertheless,  some  formalized  attempt  must  be  made  to  provide  the 
individuals  impelementing  the  program  seme  sort  of  training  to  carry  out 
the  task. 

In  addition  to  a  training  component,,  supervision  of  the  ongoing 
services  will  be  necessary.    Such  supervision  is  aimed  at  ensuring  that  the 

» 

program  is  being  implanented  as  intended  and  to  revise  it  given  that  it  is 
not  working.    In  such  cases,  data  must  be  gathered  oh  the  client  (sefe 
below)  . 

Erogreim-fdoaitoriog^    In  order  to  ensure  that  the  intervoition  is  being 
'  implanented  correctly  and  that  it  is  having  desirable  effects  it  must  be 
monitored"  by  the  professional  or  his/her  designers.    Some  writers  believe 
Jbhat  ongoing  evaluation  of  clinical  services  is  essential  (Barlow,,  1980, 
1981) .    l-iethods  through  vAiich  this  can  be  acccmprished  will  vary  from  case 
to  case,  but  would  include  self-report  inventories  and  checklists, 
self-monitoring,  direct  observation  by  parents,  or  teachers  (e.g.,  Nelswn, 
1981) .    R>r  many  practitioners  this  data  gathering  operaL;ion  will  prove 
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especially  difficult  and  costly.  Yet^  some  attempts  must  be  made  to  gather 
data  on  client  outcome. 

Aside  from  monitoring  data  to  determine  if  the  intervention  is  having 
an  impacrtr  it  is  desirable  to  gather  data  to  monitor  any  side  effects^  both 
positive  and  negative.    In  the  caSe  of  positive  side  effects^  an  especially 
effective  intervention  will  usually  result  in  positive  behaviors  for  the 
child.    The  child  may  return  to  the  regular  classropnir  improve  academic 
performance^  develop  new  friends ^  afcquire  new  social  skills^  and  so  forth. 

Monitoring  undesirable  side  effects  is  also  important  with  certain 
intervention  proceSures.    Et>r  exanpler  implementation  of  certain  aversive 
procedures  such  as  implosive  therapy  may  resiolt  in  developiwait  of 
undesirable  behavior  r  su^h  as  avoidance.    In  addition  to  monitoring  certain 
undesirable  behaviors  of  the  child ^  the  professional  should  consider  the 
potential  negative  influences  of  a  program  on  the  parents  anc[/or  siblings. 
Moreover,  a  desirable  change  in  the  child's  behavior  couic^  r^^sult  in  a 
negative  change  in  a  parent's  or  other  sibling's  b^hc^vior. 

Gathering  data  to  monitor  the  intervention  pL^ogram  and  its  side 
effects  should  not  necessarily  b^  regarded      research,  for  these  two 
activities  are  different  on  both  methodological  and  conceptual  grounds  (cf . 
Kratochwill  &  Piersel^  in  press).    Yet^  by  gathering  some  data  on  the 
client^  the  professional  becomes  accountable  to  the  consumer  (Wilson  & 
O'Learyr  1980).    Thus^  rather  than  reported  "success"  through  s  jlDjective 
meansr  the  professional  may  be  able '  V-o  provide  some  type  of  cr  edible  data 
to  docunvent  change. 

Inf crmsd Xoosent .    Some  of  ^  the  issues  involved  in  infonr-su  consent 
have  already  been  elucidated:. in  the  context  of  ,^issessment.  SimilJir 
concerns  must  be  addres3au  in  implenentcition  of  an  intervention  program. 
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'Three  major  issues  can  be  raised  regarding  informed  consent  in  therapeutic 
WDrk  with  children,  nanely  conpeterKry  to  give  consent ,  freedom  from 
constraint,  and  clarity  in  the  information  given  (Stuart,  1981). 

CmpotQXK!y^tor;x::^^CQUseut^   As  noted  above,  the  issue  of  when  and  if 
children  are  cotipetent  is  characterized  by  considerable  controversy. 
Usually,  parents  or  guardians  must  play  a  primary  role  here^  since  the 
child  might  be  juAjed  as  incompetent  to  give  consent.    Yet,  in  other  cases, 
especially  where  guardianship  has  not  been  established,  a  group  or 
carriitteie  extevrral  to  the  institution  or  circunstances  should  assign  an 
edvocate  to  the  child  to  assist  him/her  in  determining  whether  the 
intervention  program  is  acceptable  (Morris  &  Brown,  1982;  Ross,  1980). 

Even  in  the  case  of  adults,  the  issue  of  when  an  individual  is 
oc^petent  to  give  consent  is  quite  subjective.    Sane  authors  have 
charat^terized  competency  as  the  appearance  that  the  individuals  know  what 
he/she  is  doing  (e.g.,  Hardi sty,  .1973)  or  if  he/she  seetis  to  know  what  they 
are  doiro  in  a  layman's  sense  (e.g.,  Woth,  Meisel,  &  Lidz,  1977).    In  some 
cases  the  courts  have  upheld  that  persons  are  considered  legally  catipetent 
unless  it  can.be  proven  otherwise  (Lotnaa^y^>.S6curiw^Mutual  JJi^av-Ii^ 
Co.,  197?) •  if 

Five  different  methods  have  been  proposed  for  (determining  competence 
(Roth,  Meisel,  &  Lidz,  1977,  reviewed  by  Stuart,  1981,  pp.  719-720): 

1.  A  person  may  be  judged  competent  if  he/she  shows  a  clear 
desire  to  participate  in  the  activity. 

2.  Competence  can  be  inferred  from  the  judgment  that  the  person 
has  made  a  "reasonable"  choice  (Friedman,  1975). 


AsseBSinent  Rias 

437  ' 

3,  Oompetence  can  also  be  inferred  from  the  belief  that 
participation  in  a  program  is  based  on  a  rational  process 
(Stone,  1975) • 

4,  OxTipetence  is  infprred  when  the  person  displays  the  ability 
to  understand  tine  nature  of  the  intervention, 

5,  The  connpetence  of  the  individual  is  evaluated  by  assessing 
the  actual  level  bf  understanding  of  the  procedure. 

Procedures;  for  selecting  one  6f  these  tests  of  competence  have  been 
proposed  (Roth,  Meisel,  &  Lidz,,  1977)  •    The  test  rarrres  from  the  least 
stringent  (consent  through  participation)  to  the  most  stringent 
(demonstratirg  understanding)  •    It  would  appear  that  such  •'tests"  could  be 
applied  with  children*    Nevertheless r  these  criteria  would  need  to 
withstarri  the  scrutiny  by  trie,  courts;  currently  they  appear  quite  * 
siibjective  (see  Stuart^  1981  for  further  discussion  of  these  issues)  • 

Ereedcaii^from^CQDstraiat^  Coercion  occurs  ••."when  false  or  incomplete 
information  is  given  about^  proposed  procedures,  when  nonparticipation  is 
punished  in  a  way  other  than  by  simple  loss  of  the  potential  benefits  of 
participation,  or  when  compliance  is  obtained  thrpigh  physical  coercion" 
(Stuart,  1981,  p.^721)*  When  these  factors  enter  into  the  intervention  ■ 
'  process  truly  informed  consent  is  not  possible.  Constraint  is  especially 
worrisonrte  in  the  case  of  children", 

Uttfortunately,  even        alternative  of  assigning  an  advocate  to  the 
child  does  not  allow  hiro/her  to  refuse  intervention.    The  issue  of 
intervention  refusal  .is  an  important  one  especially  with  children  and 
specifically  in  the  case  of  a  severe  behavior  problem.    It  would  appear 
that  children  should  have  the  right  to  refuse  intervention,  but  the  right 
to  provide  intervention  also  exists.    When  a  conprcmise  cannot  be 
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developedr  formal  legal  rulings  may  be  the  only  alternative  (Morris  fe 
Brown,  1982)  • 

Some  of  the  more  salient  legal  problems  with  refusal  of  intervention 
include  the  following  (see  Stone,  1975): 

1*    The  client's  competency  to  decide  whether  or  not  to  refuse 
treatment.  ' 

2.  Procedures  for  obtaining  iaforifaad  consent  of  a  severely 
^            disturbed  but  legally  competent  individual. 

3.  Handling  objections  on  religious  grounds. 

4.  Ttie  civil  libaility  of  a  practitioner  if  a  client  who  has 
refused  treatment  injures  him/herself  or  others. 

5.  Increased  cost  to  taxpayers  of  individuals  Who  refuse  less 
expensive  treatment  and  insist  on  more  expensive  ones 
(Schwitzgebel  &  SchwitTgebel,  19^0  p.  53). 

Generally,  the  individual  involved  in  treating  children's  learning  and 
behavior  probl ens  must  determine  the  level  of  coercion  in  each  case  and 
minimizae  it  within  the  professional  relationship. 

Clar  i  ty-,of  -  the  ^Inf  onnatioo^Gi  ven^    Ttje  clarity  of  the  information 
given  can  influence  the  degree  to  t^ich  consent  is  truly  informed. 
Generally,  information  should  be  complete  arxJ  ccranunicated  in  a  clear 
fashion.    Fox  therapeutic  purposes,  a  multiple-pact  consent  form  can  be 
employed  (hfeirtin,  1975;  Miller  &  Willmer,  1974). 

Issues  in  tesearch 

Experimentation  with  children  experiencing  learning  and  behavior 
problems  also  raises  a  nunber  of  ethical  and  legal  considerations.    Many  of 
the  issues  that  are  raised  in  research  are  similar  to  tlpse  that  have  been 
presented  within  the  context  of  assessment  anc5  intervention.  Yet,  some 
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special  issues  er^rcje  in  research  simply  because  research  is  the  primary 
activity.    Several  considerations  have  been  advanced  (Kazdinir  1980)  • 
Firsts  since  experimental  research  usually  requires  manipulations  of 
variables^  subjects  could  be  exposed  to  certain  conditions  that  are  harmful 
or  stressful  to  thou.    For  example^  a  child  experiencing  a  severe  emotional 
problem  might  be  exposed  to  an  intervention  that  causes  a  great  dealpf 
stress  and  anxiety.    Whether  or  not  a  child  should  be  exposed  to 
interventions  that  cause  stress  raises  both  ethical  and  legal  issues. 

Another  consideration  is  that  in  research  information  nay  be  withheld 
from  the  child.    Providing  information  may  reduc.e  the  efficacy  of  the 
intervention  or  conflict  with  the  goals  of  experimentation.  Yet^ 
withholding  information  from  the  child  and  or  the  parents  may  not  meet 
informed  consent  guidelines. 

Ttiird^  the  actual  data  collection  that  occurs  in  the  typical  research 
process  may  involve  the  privacy  of  the  child  and  his/her  parents.  For 
exOTiple^  as  part  of  data  monitoring  in  a  home  setting ^  certain  private  and 
personal  information  might  be  reveals^. 

Pour  thy  some  of  the  methodological  requirements  of  research  may 
conflict  with  intervention  objectives  of  the  child.    For  exanple^  in 
betueen-group  research  some  subjects  might  be  assigned  to  a  condition  that 
provides  no  intervention  or  to  a  condition,  in  which  the  child  receives  an 
intervention  known  in  advance  to  be  less  effective  than  another  available 
method,  ■  • 

Fifth,  the  differential  status  betveen  the  investigator  and  the  child 
raises  ethical  concerns  in  that  the  child  becomes  vulnerable  to  possible 
abuses.    Children  might  not  object  to  same  intervention  that  an  adult  would 
readily  object  to.    E2ecause  children  are  frequently  in  a  "non-power" 

O  "   ■  ' 
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position,  they  are  more  likely  to  suffer  certain  typos  of  abuses, 

Itkere  has  been  growing  recognition  that  the  protection  of  human 
subjects  in  research  is  necessary.    A  nunber  of  laws  regulating  research 
with  hunan  subjects  have  been  proposed*    Tlie  Nurcsitberg  Code  (1946) , 
Declaration  of  Helsinki  (1964)  and  the  Institutional  Guide  to  DHEW  Policy 
on  Protection  of  Hunan  Stubjects  (1975)  all  testify  to  the  recognition  that 
human  rights  are  important  in  research.    Some  of  these  are  specific  to 
children.    In  addition  to  these,  the  APA  (1981)  under  Principle 
9  of  the  Ethical ^grinciples^of^sQrcholoqists  has  provided  10  guidelines  for 
research  with  human  subjects.  ^ 

In  this  chapter  we  review  scroe  major  legal  and  ethical  issues  that 
have  been  raised  in  the  conduct  of  research  with  human  subjects.  Our 
discussion  is  not  conprehensive^  but  is  designed  to  elucidate  some  specific 
issues  that  ennerge  in  assessment  and  intervention  research  on  children.  For 
a  more  detailed  discussion  of  ethical  and  legal  issues  in  research^  the 
reader  should  consult  several  sources  (e.g.,  Bersoff,  1978;  1979;  Brady , 
1979;  Kaadinr  1980;  Kelman,  1971;  McNamara  &  Wbods,  1977;  Schwitsg^l  & 
Schwitzgebelr  1980). 
Inf orn)3d  jConsent 

T^e  informed  consent  doctrine  first  emerged  as  a  formal  rule  for  the 
physician-patient  (Bersoff,  1978).    Essentially,  the  issues  raised  in 
assessment  and  intervention  apply  in  research.    Yet,  by  virtue  of  labeling 
one's  activity  as  "research**,  sane  special  concerns  emerge.    For  exanple, 
typically  the  investigator  mfust  make  a  formal  request  for  condtxrting  the 
research  and  have  a  research  proposal  reviewed  by  an  independent  connittee. 
ScxT>e  special  problems  that  may  arise  in  tliis  area  involve  the  capability  of 
children  to  give  consent  (see  above  discussion) .    Also  providing  advanced 
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knowledge  about  a  particular  intervention  may  be  difficult  if  it  is  a  new 
technique  and/or  little  is  known  about  its  influence  from  previous 
research.    Such  factors  may  make  a  truly  knowledgeable  decision  impossible 
(Kazdinr  1980),    A  nunber  of  suggestions  have  been  advanced  for 
demonstrating  that  potential  research  subjects  are  informed  prior  to 
providing  consent  to  participate  in  a  research  program.    Etor  example,  the 
use  of  a  two-stage  (Miller  &  \^illner,  1974)  and  a  three  stage  (Stuart, 
1978,  1981)  consent  form  have  been  proposed.    Grabowski,  O'Brien,  and  Mintz 
(1979)  proposed  a  system  based  on  well-constrtx;ted  information  forms  and 
correlated  multiple  choice  items.    The  materials  include  a  description  of 
the  consent  procedures,  a  statement  of  purpose,  description  of  experimental 
procedures  and  alternatives,  and  statements  stipulating  that  withdrawal  is 
an  ongoing  option.    Despite  these  options,  providing  such  information  has 
been  shown  to  influence  both  the  subject's  willingness  to  participate 
(Stuart,  1978)  and  the  potential  results  (Gr under,  1978) . 

When  precautions  have  been  taken  to  inform  the  subject,  questions  have 
also  been  raised  over  the  meaning fulness  of  the  activity  (Palmer  &  Vtohl, 
1972) .    Subjects  may  forget  that  they  signed  a  consent  form  or  indicate 
that  they  did  not  understand  the  purpose  of  the  study. 

Sometimes  researchers  may  not  inform  subjects  that  they  will  be 
randomly  assigned  to  conditions  assiming  that  they  will  then  refuse  to 
participate.   IfcLean  (1980)  stu3ied  the  effects  of  informing  clinically 
depressed  subjects  that  their  treatment  assigiments  were  made  on  a  random 
basis,  in  terms  of  their  willingness  to  consent  to  intervention.    He  found 
that  none  of  the  104  subjects  who  v;ere  informed  of  random  assignment 
refused  to  participate  in  ttie  prograii.    Also,  there  was  only  a  negligible 
effect  in  subjects*  willingness  to  consent  between  the  informed  and 
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unin Conned  corvJitions;.    V)clBax\  (1980)  noted  that  t\)B  issue  of  random 
assignment  may  be  less  critical  than  other  issues  raised  over  informed 
consent  procedures, 
DGception 

Closely  related  to  the  informed  consent  notion  is  the  use  of 
deception;    Technically r  the  true  informed  consent  might  be  held  to  free  of 
any  deception.    Yet^  the  issue  in    research  is  whether  or  not  the  deception 
is  justified  in  light  of  the  benefits  that  might  ensue  fron  the  research 
(Kazdinr  1980).    Although  the  scientific  contributions  of  a  study  may 
determine  if  the  deception  can  be  justified  (Kelman^  1968) 9  the  benefits  of 
a  particular  study  can  be  difficult  to  assess ^  especially  v^hen  the 
researcher  has  vested  interests  in  the  investigation  (Karfin,  1980).  Based 
on  issues  such  as  these ^  the  justification  for  deception  in  research 
depends  on  several  considerations:         '  ^ 
Firsts  the  scientific  investigation  must  merit  the  type  of 
deception  that  is  used.    Wither  or  not  the  deception  is  merited 
iSf  however r  a  subjective  jucJgmant  that  requires  reliance  on 
persons  other  than  the  possibly  biased  investigator.    Second ^ 
there  must  be  assurances  that  alternati  e  r^iethods  of 
investigation  that  would  produce  the  information  proposed  in  an 
experiment  that  uses  deception  is  entirely  an  empirical  matter. 
Researchers  may  argue  in  all  honesty  about  the  extent  to  vAiich 
deception  is  essential.    Thirds  the  aversiveness  of  the  deception 
itself  bears  strongly  on  the  justification  of  the  study. 
Deceptions  vary  markedly  in  d^gree^  although  ethical  discussions 
usually  focus  on  cases  where  subjects  are  grossly  misled  about 
their  own  abilities  or  l^ersonal  characteristics.    Finally^  the 
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potential  for  and  maqnitiide  of  the  hamful  off  exits  bf  tho 
deception  on  the  stisjccts  alco  dictate  whether  the  deception 
would  be  justified.    Whether  an  experiment  \x^  ' j  deception  xs 
justified  needs  to  be  vgeighed  carefully*    Increasingly^  research 
that  seriously  misleads  the  subject  simply  is  not  permitted 
(Kazdin^  1980,  p,  390)* 

Debriefing 

Once  any  deception  has  been  employed  in  research,  it  is  the 
responsibility  of  the  professional  to  describe  the  nature  of  the 
experiment,  that  is,  to  debrief  the  subject  regarding  the  purposes  of  the 
stu3y  ard         was  done  in  the  study.    A  major  purpose  for  this  debriefing 
activity  is  to  minim isoe  any  stress  or  problems  that  may  have  been  a 
function  of  the  actual  deception (Ka  2d  in,  1980?  Kelman,  1968). 

Although  debriefing  appears  to  be  an  important  activity  for  the 
researcher,  many  unanswered  questions  are  raised  regarding  this  particular 
tactic*    For  example,  it  is  possible  that  the  dd^riefing  activity  does  not 
resolve  the  problems  that  were  raised  for  the  client.    In  this  regard,  it 
is  possible  that  a  youngster  who  is  exposed  to  an  aversive  situation  in  a 
study  when  debriefed  may  not,  in  fact,  feel  more  comfortable*    It  is  quite 
possible  that  the  subject  may  feel  hostile  or  fearful  toward  the 
experimenter  no  matter  how  much  debriefing  takes  place. 

Questions  might  also  be  raised  as  to  wljen  the  d^riefing  should  take 
place.    For  exctriple,  it  might  occur  iimiediately  after  the  subject 
participates  or  after  all  subjects  have  participated  in  the  experiment.  In 
the  latter  case  it  might  be  assumed  that  debriefing  all  children  at  the 
sane  time  at  the  end  of  the  stu3y  would  minimize  cormiunication  among  ' 
subjects  if  this  is  an  issue.    However,  this  would  need  to  be  weighed 


Asoossment  Bias 

444 

against  the  potontial  negative  effects  of  having  the  child  experience  a 
period  of  tinw  under  which  the  deception  was  csnploytjd.    Another  issue  is 
that  even  if  attempts  are  made  to  debrief  the  subject  regarding  the  nature 
of  the  stuay,  it  cannot  always  be  assumed  that  the  person  uunderstands  what 
was  done  and  why  it  was  done,    in  some  respects,  the  same  problems  that 
occur  in  debriefing  are  those  that  emerge  in  tte  informed  consent  issue. 
Particularly  with  young  children  the  issue  of  understanding  the  debriefing 
activity  might  be  raised.    As  Kazdin  (1980)  has  noted,  the  investigator 
employing  any  deception  must  demonstrate  that  debriefing  activities  were  in 
fact  sixx:essful.    It  behooves  the  investigator  to  rnake  sure  that  the 
debriefing  activities  are .systematic,  well  controlled,  and  monitored. 

Sutgpary-aDd..CoDclusioDS 
In  this  chapter,  v«  have  provided  an  overview  of  the  ethical  and  legal 
considerations  in  assessment,  treatment,  and  research  of  minority  and 
notroinority  children.  The  issues  raised  with  respect  to  children  apply  to 
research,  intervention,  and  assessment  in  these  areas  and  also  extend 
beyond  work  in  this  area  whenever  children  are  involved.    The  work  with  * 
children  in  assessment,  intervention  and  research  involves  considerations 
of  several  factors  including  law,  ethics,  apd  morality.    These  influences 
provide  a  conceptual  guide  for  the  professional  involved  in  work  with 
children' s^learnning  and  behavior  problems.    As  we  noted r  laws  have 
provided  one  of  the  strongest  inQuences  on  professional  behavior,  but  in 
many  cases  laws  have  yet  to  be  enacted  for  specific  situations  and,  in  many, 
cases,  they  have  been  postscriptive  rather  than  prescriptive.    As  a  second 
source  of  influence,  ethics  have  usually  been  developed  as  guidelines  for 
individuals  wrking  in  tha  field.    In  practice,  rights  and  ethics  overlap 
and  it  is  important  tor  the  clinician  to  consider  the  various  ethical 
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guidelines  for  professional  behavior  across  various  disciplines  to  which 
he/she  adheres*    Finally,  moral  principles  have^provided  guidelines  for 
condurt  that  transcends  specific  lavs^  and  ethical  codes*    Ttiese  typically 
refer  to  absolute  assunptions  about  the  rights  and  responsibilities  of 
individuals. 

In  assessment  wDrk  with  children  experiencing  learning  and  behavior 
prqblens/  a  number  of  issues  have  emerged.    Specif ically,  sane  criticians 
of  assessment  including  invasion  of  privacy,  crating  an  unfavorable 
atrooshpere,  developing  labels,  eng^ing  in  discriminatory  practices,  ha4e 
all  been  advanced •    Bach  of  these  considerations  must  be  noted  in  any 
assessment  vx)rk.    Several  influences  from  the  legistlative  and  judicial 
areas,  as  well  as  professional  associations  have  been  raised'  for  guiding 
assessrnent  activities.    Ttiese  were  reviev«3  as  they  appeared  relevant  for 
assesaiient  of  children's  learning  and  behavior  problems.    The  LarryJ2>  and 
EASE  jjecisions  were  examined  closely*    The  different  conclusions. reached  by 
the  judge  in  each  case?was  suggested  to  be  a  result  of  the  definition  of 
bias  each  judge  adopted. 

A  nunber' of  issues  have  been  raised  in  intervention  efforts  for 
children*    In  this  area  of  learning  and  behavioral  disturbance,  issues  have 
beein  raised  in  the  control  of  behavior,  personal  rights,  and  selection  of 
interventions*  VSien  selecting  a  particular  intervention  strategy,  the 
professional  must  consider  the  available  literature!.  •  ,)eci-f ic  intervention 
objectives r  least  restrictive  alternatives,  availabit::  professional  staff 
and  training  of  these  irK3ividuals,  monitoring  of  the  program  .established, 
and  informed  consent*    The  informed  consent  riptibn  strongly  advocated  in 
any  intervention  efforts  is  a  ccmplex  one  and  not  easy  to  address.  Issues 
that  lisrs  reviewed  here  indluded  the  ccxnpetency  to  give  consent,  freedcsn 
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fron  constraint^  and  clarity  of  the  information  given  to  the  child  and/or 
his/her  providers. 

Several  issues  involved  in  intervention  research  on  children's 
learning  and  behavior  problemSr  ^«re  atlso  reviewed.    These  included 
informed  consent r  invasion  of  privacyr  deception  and  ddDriefing.    Each  of 
these  issues  was  reviewed  in  the  context  of  sonne  issues  from  the 
professional  literature  reviewed  in  earlier  sections  of  the  report.    It  is 
hoped  that  future  assesannentr  intervention,  and  research  activities  of 
individuals  working  with  children  experiencing  learning  and  behavior 
problems  mil  be  guided  through  considerations  raised  in  this,  chapter . 

•  { 
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Chapter  9  *  •  , 

The  Influence  of  Professional  Organizations 

Professional  organizations  have  influenced  assessment 
activities  in  general  and  issues  in  the  area  of  test  bias 
specifically.     The  impact  has  been  in  several  areas  (Oakland 
&  Laosa,   1977).     First  of  all,,  various  professional 
organizations  have  become  involved  in  making  public 
statementT^  testing  and  test  bias..    Indeed,  as  will  be 
emphasized  below,  some  groups  have  taken  a  formal  positi^on 
against  standardized  tests  in  educational  settings.  Second, 

certain  professional  groups  have  published  guidelines  to 
accompany  various  assessment  practices.     Such  guidelines 
often  specify  the  nature  of  professional  conduct  in  the" 
choice,  administration,   and  use  of  tests  and  assessment 
practices.     Third,  some  professional  groups  have  been 
involved  in  certifying  ^nd  licensing  individuals  who  offer 
.  these  psychological  or  educational  assessment  services. 

in  this  chapter-  we  review  some  of  the '  f)rof  essional  " 
groups  that  have  been  active  in  establish^g  positions  and/or 
have  prepared  documents  related- to  assessment  practices.  It 
Should  -be  emphasized  that  although  a  number  of  .professional 
groups/organizations  have  considered  is^sues  .el-evant  to  bias 
in  assessment,  only  a  few  have  provided  any  formal  guidelines 
related  to  practice.     The  prof essional  organizations . that  ^ 
•have  provided  standards  for  assessment/intervention  are 
listed  in  Table  9.1.     This  list  is  by  no  means  exhaustive  but 
Should  alert  readers  to  consider  the  existing  standao^s  and 
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guidelu.es  from  their  own  professional  organizations.  A 
review  of  other  organizations  and  societies  that  have 
developed  ethical  policies  can  be-  found  in  the  AAAS 
Professional  Ethics  Project   (1980).     This  document  provides  a 
review  of ' guidel ines  developed  in  .both  the  physical  and  the 
social . sci ences . 


Groups  Representing  "Marxist";  Opposition  to  Tests 


Several  self-identified  "Marxist"  groups  have  come  out 
in  opposition  to  various  psychological  tests.     Jensen  (1980) 
reviewed  some  perspectives  in  this  area  so, we  will  only 
present  a  orifef  overview.     As  Jensen   (1930)   notes,  there  is 
nothing  intrinsic  in  original  Marxian  theory  that  would  be  in 
opposition  to  mental  tests.     Moreover,  although  IQ  tests' were 
once  disdained  in  the  Soviet  Union,  testing  is  still 
apparently  common  in  this  country.     Nevertheless,  some 
opposition  to  tests,  particularly .  in  the  study  o\  individual 
differences,  hv^ve  been  advanced    (Teplov  &  Mebylitsyn,  1969).. 
Presumably,  Soviet  psychologists  have  relied  somewhat . less  on 
tests  than  those  psychologists  in  the  United  States. 

Marxists  outside*" the  USSR  have  expressed  objections  to 
ability  tests  (e.g.,  Lawler,  1978,  Simon,  1971).     One  V 
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Table  9.1 

Professional  Organizations  Who  have  Provided 
Standards/Guidelines  for  Assessment/Practice 

American  Educational  Research  Association  (AERA) 
American  Personnel  and  Guidance  Association  (APGA) 

.V       •  - 

American  Psychological  Association  (APA) 
'Association  of  Black  Psychologists   (ABP)  / 
Association  for  Advancement  of  Behavior  Therapy  (AABT) 
^N^tional  Association  for  the  Advancemspnt  of  Colored  People 
(NAACP)  .  .  •.         '  .      ^  ' 

National  Association  of  School  Psychologists   (NASP)  ; 

jj^ational  Counci  l  of  Measurement  in  Education   (NCWE )  '  . 

...      -  '     .  • ' 

National  Education  Associ at  ion   (NEA)  , 

*       ■  .  %^ 

Society  for  1:he  Study  of  Social  Issues*  (SSSA) 

*The  SSSA  is  Division  9  of  the  American  Psychological 
Association.  .  . 

(Source:  Kratochwill^  T.  R. ,  Alper /:  S. , -&  Cancelli ,  A.  A. 
Nondiscriminatory  assessment:   Perspectives  in  psychology  and 
special  education.    In  L.  Mann  &  D.  "  A.  Sabatino   (Eds. )  ,■  The 
fourth  r-eview  o£  special  education.     New  York:  Grune  &  . 
Stratton,   1980.     Reproduced  by  permission) 
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perspective  on  this  is  presented  by  Simon   (1971):  • 

Since,  in  a  class  society,  on  average,  the  higher 
the  social  status,  the  greater  the  likelihood  that 
test  questions  of  the  kind  described  can  be 
answered;  a  test  standardized  this  way  is  bound  to 
set  standards  of  "intelligence"  which  are  largely 
class  differences  disguised.     It  is  an  inescapable 
fact  that  the  middle  class" child  will  always  tend 
to  do  better  than  the  working  class  child,  as  a 
necessary  result  of  the  way  in  which  the  tests  are 
constructed,   validated,   and  standardized   (p.  78). 

^  ■   -\.  . 

Groups  Representing  M\nonities 
Several  different  professional  organizations  that 
"represent  minority  groups  in  the  United  States  have  made 
formal  statements  regarding  the  use  of  standardized  tests 
The*  Association  of  Psychologists  for  La  Raza   (APLR)r  an 
organization  for  Chicano  psychologists^  does  not  have  an 
official  position  on  minority  assessment.     However^  the 
"^president 'of  the  association  responded  to  the  APA  report 
"Educational  Use  of  Tests  with  Disadvantaged . Students" 
(Cleary  et  al.^   1975).     Although  the  report  stressed  fa i r 
assessment  practices,  Berna.l    (1 975 )  .  poi nted  to  various 
oversights  in*  the  report: 

The  key  arguments  6f  many  critics  of  extant  test ing 
and  test  development  procedures  have  not  been 
discussed  or  answered ,   recommendations  for 
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Improving  test  development  with  and  for  minorities 

■         '     "  # 
have  not  been  set"  forth/   The  blame  for  bad  testing 

appears  to  have  been  shifted  to  the  practitioner, 
and  the ^schools  seem  to  be  the  only  institutional 
villains  in  the  story.     In  short,   the  classic  "Type     :  ■ 
III'*  errors  were  made  by  a  committee  that  lacked 
minority  membership  to  articulate  minority 
perspectives:  Not  enough  of  the  right  research 
questions  and  issues  of  interest  were  raised.     As  a 
result,   the' document  generally  has  become  an 
apologia  for  testing   (p.  92).  ,  • 

Professional  groups  representing  black  minorities  have 
been  somewhat  more  active  in  their  opposition  to  certain 
testing  practices..    For  example,  the  National  Association  for 
th^e  Advancement  of  ^Colored  People   (NAACP)  held  a  conference 
on  minority  testing  in  1976.     The  report'  (Gallagher,   1976)  ' 
pointed  to  uses  and  misuses  of  tests,  psychometric  issues, 

i 

public  policy,  and  a  code  to  help  ensure  the. fair  use  of 

t^ests.  '  .. 

Perhaps  the  most  influential  group  in  the  testing  arena 
has  been  the 'Association  of  Black  Psychologists   (ABP)  with 
their  call  for  an  immediate  moratorium  on  the  use  of 
psychological  tests  with  children  from  disadvantaged 
backgrounds.     In  a  subsequent  report,  Williams  (1971,  p.  67) 
noted  that  ability  tests: 

1.  Label  black  children,  a-s  uneducable, 

2.  -.   Place  black  children  in. special  classes. 
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3*     Potentiate  inferior  ^ucation, 

4.  Assign  black  children  to  Xower  education  tracts  than 
v/hites^ 

5.  Deny  black . chi Idren  higher  educational 
opportunities,  and 

6.  Destroy  jjositive  intellectual  growth  and  development  . 
of  black  children. 

The  ABP  has  continued  to  be  quite  afetive  in  their  opposition 
to  standardized  tests  used  for  special  education 
classification  in  schools.     The  organization  has  spear-headed 
suits  against  school  districts   (see  discussion  in  Chapter  8). 
Also,   in  par.t,  as  a  response  to the  ABP^s  proposed  moratorium 
on  the  use  of  psychological  tests  with  blacks,   the  American 
Psychological  Association's  Board  of  Scientific  Affairs 
formed  an  ad  hoc  committee  to  investigate  the  validity  of 
testing  in  educational  settings.    The  committee  report 
covered  a  broad  spectrum  of  issues,   including. ...theory  of  human 
abilities,  test  misuse  and  misinterpretations,  evaluation  of 
the  ••fairness"  of  tests  in  use,  and  alternatives  to  commonly 
used  intellectual  tests   (Cleary  et  al.,  1975). 

Reaction  to  the  report  from  the  ABP  was  quite  negative. 
Jackson   (197.5)  noted: 

In  this  writer's  judgment  vhe  report  is  blatantly 
racist.     It  continues  to  prcinulgate  the  notion  of 
an  "intellectual  deficit"  among  black  people,  seeks 
to  treat  all  dis<:idvantaged ;  in  a,  similar  manner , and 
employs  a  definition  of'  "fairness**  which  is 
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intrinsically  unfair.     It  attempts  to  describe  the 
retestirig  functions  in  a  seemingly  educationally 
desirable  manner  when  in  fact  these  functions  serve  • 
to  sustain  and  maintain  the  status  quo  while  ^ 
systematically  prohi  bi t ing  black  self-actualization 
and  self-rdetermination  and  promulgating  exclusion 
of  blacks  from  the  American  mainstream.  The 
committee  appears  to  have  ignored  the  wealth  of.  ^ 
work  of  black  psychologists  in  this  area.  To 
discuss  and  scientifically  discount  is  one  thing; 
to  totally  ignore  is  racism  at  its  arrogant  worst 
(p.   88) •  /      .  ^ 

—       The  APA  committee  chairman   (Humphreys,.  1975)  wrote  a 
rebuttal  to  the  ABP's  reaction.    Humphreys  (1975)  noted: 

The  authors  of  the  report  also  believe  that  test 
scores  properly  interpreted  are  useful.     We  do  not 
and  cannot  support  a  moratorium  on  .testing  in  the 
schools.     Furthermore,  many  useful  interpretations 
of  tes^t  scores  can  be  made  without  appreciable  loss 
of  accuracy  in  the  absence  of  information  about 
race*,   ethnic  origin,   or  social  class  of  the 
examinee.     Whether  demographic  membership  is  ne'eded 
is  an  empirical  matter  and  not  one  decided  on'the 
basis  of  ideology   (p,   95),  , 
Nevertheless,   the  ABP  has  rejected  the  APA  report  and  noted 
that  a  moratorium  is  no  longer  enough ;  what  is  needed  is 
government  intervention  and  sanctions,  against  testing 
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practices* 

American  Personnel  and  Guidance  Association 
Guidance  counselors  and  - associ ated  personnel  are 
sometimes  involved  in  testing  practices.     This  is  often 
associated  with  vocational  or  career  assessment  and  may 
involvp  minorities.     At  the  1970  annual  convention  of  the 
American  Personnel  and  Guidance  Association  {APGA) ,  the 
Senate  adopted  a  resolution  in  which  concern  was  expressed 
over  minority  groUp  testing.     Thereafter,   the  Association  fo 
Measurement  and  Evaluation  in  Guidance   (AMER,   a  division  of 
APGA)  prepared  a  position  statement  on  the  use  of  tests,  and 
with  the  assistance  of  AMEG,  APGA,   and  the  National  Council, 
of  Measurement  in  Education   (NCMEV,   a  paper  was  adopted  as  a 
official  position  of  those  organizations   (AMEG,   1972).  Ir 
the  document  it  Vas  noted  that: 

Professional  associations,   including  the 
measurement  societies,  do  not  have  the  authority  to 
control  intentional  discrimination  against 
particular  groups,   though  individual  members  acting 
in  accordance  with  their  oWn  consciences  may  bring 
to  bear  such  powers  as  their  positions  afford  them 
(AMEG,    1972,   p.   386).  '  ,  '  . 

In  the  document  it  is  also  stated  that  issues  relating  to 
test  misuse  should  go  through  the  court  system,  boards  of 

education,  civic  service  commissions,  and  other  public 

>      ,  . 

groups. 
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The  Nationa 1  Education  Association   (NEA)  has  come  out 
against  standardized  tests.     In  1972,   the  NEA's  Center  for 
Human  Relations  held  a  three-day  national  conference  in 
Washington,   D.C.     The  theme  of  the  conference  was  "Tests  and. 
Use  of  Tqsts — Violations  of  Human  and  Civil  Rights  (Bosma, 
1973).     Individuals  attending  the  conference  were  asked  to- 
complete  a  questionnaire,   including  such  items  as; 

IQ  tests  are  not  perfectly  accurate  nor  are  they  a 

perfect  indication  of  potential. 

The  IQ  test  is  a  measure  of  experience  and  learning 
rather,  than  a  measure  of  inborn  ability. 

Most  standardized  tests  are  tests  of  developed  * 
abilities  rather  than  measures  of  potential.  . 
Given  the  possi ble  negative  effects  of  standardized 
testis,  which  of  the  following  actions'  do -you 
bel^'eve  should  be  taken? 

(a)  Eliminate  the  use  of  standardized  tests 
.  enti  rely. 

(b)  Intensify,  efforts  to  develop  culture  free 

•  tests .  "  ' 

(c)  Curtail  the  use  of  standardized  tests  except 
Cor  research  purposes^, 

(d)  Conduct  anintensive  educational  program  to 
prevent  misuses  of  tests   (Cited  in  Jensen, 

•  1980,    p.    13) . 

Following  the  meeting  the  NEA  policy-making 


ERIC 


463 


Assessment  Bias 
456 

Representative  Ass^eibly  passed  three  resolutions   (Oakland  & 
Laosa,   1977,   pp.  22-23): 

1.  To  encourage  the  elimination  of  group-standardized 
intelligence,   aptitude,   and  achievement  tests  until 
completion  of  a  critical  appraisal,   review,   and  revision  of 
current  testing  programs; 

2.  To  direct  the  NEA  to  call  immediately  a  national 
moratorium  on  standardized  testing  and  set  up  a  task  for.'ce  on 
standardized  testing  to  research  the  topic  and  make  its 
findings  available  to  the  1975  Reprei^ontati ve  Assembly  for 
further  action;  and 

3.  To  request  the  NEA  task  f^rce.on  testing  to  report 
its  findings  afid  proposals  at  the  1973  Representative 
Assembly. 

In  1973  the  NEA  task  for.ce„  again  called  for  a  national 
moratorium  on  standardized  testing ^unti 1  1975.     The  NEA 
Representative  Assembly  also  reviewed  the  moratorium 
resolution  on  testing,   suggesting  that  tests  should  not  be 
used  in  a  manner  that  denys  students  full  access  to  equal 
educational  opportunity. 

Nat ional  Association  of  School  Psychologists 

School  psychologists  are  nearly  .always  involved  in 
assessment  of  children  in  edi^^at ional  settings.     Many  of  ttie 
children  who  are  referred  for  psychological  or  special  * 
erlusational  services  represent  various  minority  groups.  The. 
National  Association  of  School  Psychologists    (NASP)   is  one 
prof esiiional  organization  representing  practicing  and 
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academic  school  psychologists  in  the  United  States  and  some 
foreign  countries*     Because  school  psychologi sts  are 
frequently  in  an  extremely  sensitive  position  in  school 
assessment  practices,   the  NASP  delegate  assembly   (NASP^  1978) 
has  adopted  a  number  of  resolutions  that  have  a  biajring  on 
assessment   (e.g.^  Resolutions  3,   6,   and  8)#     For  example^ 
Resolution  3  notes  that  school  psychologists  should  protect 
children^   especially  those  in  minority  groups,   from  abuses 
through^'  the  malpractice  of  school  psychology. 

Resolution  6  is  more  explicit  in  expressing  the  position 
that  blacks  and  other  minority  groups  do  not  manifest  an 
inferiority  in  intellectual  functioning  based  on  so-called 
genetic  characteristics.     the  NASP  has  argued  that  there  is 
inadequate  scientific  support  for  genetic  differences  in 
intelligence  among  groups  and  that  research  into  the  issue  is 
neededo 

,    Resolution  8  notes  that: 

Individuals  of  different  socio-cultural  backgrounds 
differ  in  their  readiness  to  'succeed  in  school; 
that  professional  members  of  minority  groups  have 
indicated  that  it  is  a  disservice  to  minority 
individuals  to  suggest  that  they  need  not  do  well- 
on  tests  or  achieve  a  basic  education;  and  that 
objective  measures  are  less  biased  than  subjective 
judgments  in  assigning  children  to  special  programs 
in  scl    ols  ;(p.   104  ) 

In  addi  tion  to  these  resolutions ,  some  speci  f i c 
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suggestions  for  standards  relating  to  professional 
involvement,  assessment  standards,   standards  for  parent 
and/or  student  involvement,   standards  for  eduujtio  al 
programming  and  follow-through,  and  training  standards  follow 
these  resolutions    (NASP,   1978,  pp.   105-1 07 ) • 

Association  for  Advancement  of  Behavior  Therapy 
The  Association  for  Advancement  of  Behavior  Therapy 
(AABT)   represents  practice  and  research  interests  of  behavior 
therapists.     In  May  1977,   the  Board  of  Directors  of  the  AABT 
adopted  "Ethical  Issues  for  Human  Services."     The  guidelines 
do  not  mention  issues  related  to  test  bias.     In  fact,  the 
statements  in  the  guidelines  are  conceptualized  within  the 
domain  of  treatment   (see  Table  9.2). 

7 

Within  contemporary  behavior  therapy,  assessment  and 
treatment  are  conceptually  linked   (cf.  Kratochwill,  1980,. 
1982)  and  so  it  is  possible  to  apply  any  one  of  the 
quidelines  within  the  contexjb  of  assessment  practices. 
Nevertheless,   the  AABT  has  shown  increasing  interest  in 
vassessment  practices,  as  reflected  in  the  formation  of  the 
journal  Behavioral  Assessment.     Whether  or  not  the 
organization  will  make  any, formal  statements  on  test  bias 
remains  to  be  seen.  '  •■  , 

The.^AABT  guideline's  take  on  special  signl ficance  i n 
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Table  9.2 
Ethical  l5SA:fes  for  Human  Services 

The  questions  related  to  each  is3ue  have  deliberately 
been  cast  in  a  general  manner  that  applies  to  all  types  of 
interventions,   and  not  solely  or  spQciCically  to  the  practice 
of  behavior  therapy.     Issues  , directed  specifically  to 
behavior  therapists  might  imply  erroneously  that . behavior 
therapy  was  in  some  way  more  in  need  of  ethical  concern  than 
non-behaviora  Lly-or ieni:  ^^d  therapies. 

In  the  list  of  issues,  the  term  "client"   is  used  to 
describe  the  person  whose  behavior   is  to  be  changed, 
"therapist"   is  used  to  describe  the  professional  in  charge  of 
the  intervention;   "treatneot"  and  "problem,"  although  used  in 
the  singular,   refer  to  any  and  all  treatments  and  problems 
being  formulated  with  this  checklist.     The  issues  are 
formulateo  so  as  to  be  relevant  across  as  many  settings  and 
populations  as  possible.     Thus,   they  need  to  be  qualified 
when  someone  other  than  the  person  whose  behavior  is  to  be 
changed  is  paying  the  therapist,   or  when  that  person's 
competence  or  voluntary  nature  of  that  person's  consent  is 
questioned.     For  example,   if  the  therapist  has  found  that  the 
client  does  not  understand  the  goals  or  methods  being 
considered,   the  therapist  should  substitute  the  client's 
guardian  or  other  responsible  person  for  "client,"  when  ^  ^ 
reviewing  th^e  issues  below.- 

A.       Have  the  goals  ^f  treatment  been  adequately  considered? 

467 
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1.  To  insure  that  the  goals  are  explicit,  are  they 
written? 

2.  Has  the  client's  understanding  of  the  goals  been 
assured  by  having  the  client  restate  them  orally  or 
in  writing? 

3.  Have  the  therapist  and  client  agreed  on  the  goals  of 
therapy? 

4.  Will  serving  the  client Vs  interests  be  contrary  to 
the  interests  of  other  persons? 

5.  Will  serving  the  client's  immediate  interests  be 
contrary  to  the  client's  long-term  interest? 

Has  the  choice  of  treatment  methods  been  adequently 
considered? 

1.  Does  the  published  literature  show  the  procedure  to 
be  the  best  one  available  for  that  problem?' 

2.  If  no  literature  exists  regarding  the  tr^eatment 
method ' consistent  with  generally  accepted  practice? 

3.  Has  the  client  been  told  of  alternative  procedures 
that  might  be  preferred  by  the  client  on  the  basis 

.•    of  significant  differences  in  disconforti  tre^atment 
time,   cost,   or  degree  of  demonstrated  effectiveness? 
4^     If  a  treatment  procedure  is  publicly,  'legally,   or  j 
prof  ess/iona  lly  controversial,  has  formal 
professional  consultation  been  obtai ned ,  has  the 
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reaction  of  the  affected  segment  of  the  public  been 
^^adequately  considered,  and  have  the  alternative 
treatment  methods  bean  more  closely  reexamined  and 
reconsidered? 
Is  the  client's  participation  voluntary? 

1.  Have  possible  sources  of  coercion  on  the  client's 
participation  been  considered? 

2.  If  treatment  is  lega.lly  mandated,  has  the  available 
range  of  treatments  ancJ  therapists  been  offered? 

3.  Can  the  client  withdraw  from  treatment  with  a 
penalty  or  financial  loss  that  exceeds  actual 
clinical  costs? 

When  another  person  or  an  agency  is  empowered  to  arrange 
for  therapy,  havo  the  .interests  of  the  subordinated 
client  been  sufficiently  considered? 

1.  Has-  the  suborinatcd  client  been  informed  of  the 
treatment  objectives  and  participated  in  the  choice, 
of  treatment  procedures?. 

2.  Where  the  subordinated  client's  competence  to  decide 
is  limited,  h^ave  the  client  as  well  as  the  guardian 
participated  in  the  treatment  discussions  to  the 
extent  thlit"  the  »  cl  i  en  t '  s  abilities  permit? 

3.  If  the  Interests  of  the  subordi  nated  ..per  son  and  the. 
superordinate  persons  or  agency  conflict,  have 
attempts  been. made  to  reduce  the  conflict  by  dealing 
with  both  interests? 
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Has  the  adequacy  of  treatment  been  evaluated? 

1'.     Have  quant,itati  ve  measures  of  the  problem  and  i  tr:  * 

ft 

.progress  been  obtained? 
2.     Have  the  measures  of  the  .problem  and  its  progress 

been  made  available  to  the, client  during  the 

treatment?  •      .  • 

Has  the  conf identialty  of  the  treatment  relationship 
been  protected? 

1.  Has  the  client  been  told  who  has  access  to  the 
records? 

^2.     Are  records  available  only  to  authorized  persons? 
Does  the  therapist  refer  the  clients  to  other 
therapists  when  necessary? 

1»     If   treatment  is  unsuccessful^   is  the  client  referrtJ 
to  other  therapists? 

2.  '*ias  the  client  been  tolcl  that  if  dissatisfied  .with 
the  treatment^   referral  wil}  be  made? 

Is  the  therr^rpist  qualified  to  provide  treatment? 

1.  Has  the  therapist  had  training  or  experience  in 
treating  probl?=>ms  like  the  clierit's? 

2.  II   deficits  -^xist  in  the  therapist^s  qualifications, 
has  the  client  been  informed? 

.<e     If  the  therapist  is  not  *  adequately  qualified,   is  the 
client  referred  to  other  therapists,  or  has 
supervision  by^a^qual  :ied  therapist  been  provided? 
Is  the  client  informed^of  the  supervisory  relation? 
•  » •  • 
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4.     If  the  treatme*7  administered  by  mediators,  have 

the  mediators  been  adequately  supervised  by  a  ^ 
qualifi.ed  therapist? 


Source:  An.;ociation  for  Advancement  of  Behavior 
Therapy.  ' 

Ethical  issues  for ' humen  services.  Behavior  Therapy,  1977, 
v~vi  • 
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light  of  bias  in  treatment  of  personality  and  behavior 
disturbance   (Reynolds^   1981).     In  this  regard  the  guidelines, 
could  have  direct  relevance  in  the  assessment  treatment^ 
process  by  providing  conceptual  guidelines. 

.      .      16  ' 

The  American  Psychological  Associat^ony 

American  Educational  Researqh  Association!^. 
and  the  National  Council  on  Measurement 
.  in  Education 

The  APA  has  been  actively  involved  in  providing 
standards  for  psyschologi s ts  in  academic  and  applied 
settings.     An  early  effort  to  address   issues  relating  to 
assessment  of  minority  children  occurred  within  the  Society 
for  the  Study  of  Social  iBSUtrS    (L.*^.      ,   Division  9  of  the  APA. 
The  SSSI  published  a  monbgrcipJi  in  which  ;:esting  of  minority 
groups  was  discussed  withi  n     ^      ccntc^xt  of  selection,  ^use, 
i nter pretat  : ;/./   and  sensitivity  to  whether  or  not  tests 
di  f  f  erenti'f  ..e   it^iiooiy,  validity,  and  a  re,  adequate!  y 
interpreted  -/iuh  nincri       groups  children   (Deutsch,  Fishman, 
Kogan,  Morth,   &  WhiUeman,  1964).' 

As  noted  aboyer  another  document  prepared  at  the  request 
of  the  APA's  Board  rf  r\cientific  Affairs^   and  entitled 
"Educational  Uses  of  Tests  with  Disadvantaged  Students" 
(Cleary  et  al.^   1975)   addressed  several   issues  in'^testing 
practices:    (1)   it  presented  a  review  of  def in: t ions  of 
abilities  with  special  referexice  to  general  intelligence ,  (2) 

y  ,     •      .     .  . 

^it  summarized  some  common*  classes  of  test^misuse  and 

.. '      '  >  . 

misinterpre.tation ,    (3)   it  reviewed  the  various  kinds  of 
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statistical  information  needed  to  use  a  test  effectively,  and 
(4)   discussed  existing  alternatives  to  ability  tests  and 
reviewed  new  types  of  tests  and  new  information . needed  to  . 
make  more  effective  evaluations  o^  students  in  schools. 

The  APA  also  developed  various  standards  which  Iiave  a 
direct  bea  r  i  ng  on  '  the  assessment  of  i  ncjivi  duals  .  For 
example,   the  Ethical  Standards  of  Psychologists   (APA,  1972) 
and  Standards  for  Educational  And  Psychological  Tests  (APA, 
1974)   both  contain  guidelines  on  how  tests  are  to  be  used  and 
developed.     The  Standards  were  first  developed  in  1954  (at 
which  time  they  were  called  Technical  Recommendations  for 
Psychological  Testr.  and  Diagnostic  Techniques)   and  were 
endorsed  by  , both  the  American  Educational  Research 
Association   (AERA)   and  the  National  Council  on  Measurement  in 
Education   (NCME).     Sub^'  'C'^ently,   the  three  organizations 
cooperate^d  in  the  development  of  the  1966  Standards  for 
Educational  and  Psychological  Testr-  and  Manuals,   followed  by 
J:he  1974^  Standards,     These  standards  are  presently  being 
revised  again. 

More  recerrtly,  APA  has  endors^-^d  several  revisions  in  the  ^  ' 
various  documents  .relating  to  training  and  practice  that  have 
a  direct  bearing  on  testing/assessment  practices.  These 
documents  include  the  Accredi  tat  ion  hrtndbook   (APA,  1980), 
Standards  for  Providers  of  Psychological  Services  (APA, 
1981b),    and  the. Ethical  Principles  of  Psycholgis^s  (APA, 
1981a) . 

The  Accreditation  handbook  emphasizes  two  issues  . 
■  '  '    '  .  473'  ': 
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(Pryzwansky,   1982).     First,   the  accreditation  procedures  and 
various  criteria  are  designed  to  govern  accreditation  of 
doctoral  level  professional  psychology  programs  ,  as  well  as 
prodoctora:   internship  programs.     Second,   the  APA 
accreditation  promotes  high  quality  training  along. a  variety 
of  criteria  including    (1)   institutional  settings,  (2) 
cultural  and  individual  di f f erences 7^(3 )  training  models  and 
curricula,    (4)   faculty,    (5)   facilities,   and    (6)  practicum  and 
internship  settings.     These  criteria  are  important  within  the 
context  of  training  practitioner s  in  scientificfindings  in 
assessment  and  treatment  and  are  explicit  with  regard  to  the 
proper  training  of  pro f essi ona 1  psychologists  to  be  sensitive 
to  cultural  differences. 

^  The  Standards  for  Providers  of  Psychological  Services 
(1977)   provides  a  uniform  set  of  standards  for  psychological 
practice.     It  specifies  the  minimally  acceptable  level  oi  ^ 
quality  assurance  and  performance  .for,  providers  of 
psychological  services.     The  Standards    (1977)   are  organized 
around  four  sectidns  that  relate  to  a  general  category  of 
servic-e  delivery.  ^  ^ 

The  Standards    (1977)   take  precedence  over  the  Specialty 
Gui  deli  nes    (1981^     Iji'ch  relate  to  practice  in  each  of  the 
four  specialties     r  professional  psychology   ( i  .e . ,  clinical, 
counseli  .vj,   industrial  organizational,   and  school)  i     Each  of 
these  Specialty  Guidelines,  is  written  specifically  for  , 
pti-.^tice  in  the  specialty  area  although  there  is  much  overlap 
on  some   (e.g. ,  clinical  and  school).     For  example/  in  the 
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Specialty  Guide  lines  for  \a\q  Delivery  oi.  Services  by  School 
Psycholocji  sts ^   school  psychololgical  services  include: 
Psychological  and  psychoeducational  evaluation  and 
assessment  of  the  school  functioning  of  children 
and  young  persons .     Procedures  include  screening, 
psychological.and  educational  tests  (particularly 
"individual  psychological  tests  of  intellectual 
functioning,   cognitive  development,  affective 
behavior,   and  neuropsychological  evaluations,  with 
explicit  regard  for  the  context  and  setting  in 

r 

which  the  prof essional   judgments  based  on 
assessment,  diagnosis,   and  evcjluation  will  be  used 
(p.  672). 

Ethical  principles  6f  psychologists   (1931)  provides  10 
ethical  principles  in  the  areas  of  responsibility, 
competence,   moral  and  legal  standards,   public  statements,^ 
confidentialty,   welfare  of  .the  consumer,  professional 
relationships,   assessment  techniques,   research  with  human 
participants,   and  care  and  use  of  animals. 

While  the  Ethical  Principles  of  Psychologists  (1981) 
contains  material   relating  to  the  psychologists*  general 

practi ce ,   the  Standards  for  Educational  and  Psychological 

\ 

T e s t s  expands^'on  thes*-  by  provi'ding  more  det^ile.d  and 
specific  guidelines  for  tost  developers  and  users.     These < 
guidelines  apply  to '  any  assessment  procec.iirt  ,   device,  d^r 
aidl---i.e.,   to  any  systematic  b'^sis  for  drawing  infc^r'nnc^s 
about  people   (p.   2).     Although  these  standards  do 
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Gpeci£ically  deal  with  the  concept  of  test  bias,   it  can  be 
assumed  that  adherence  to  them  will  reduce  bias  in  the 
assessment  process.     However,  a  footnote  in  the  Standards 
indicates  a  formal  position  against  any  testing  morratorium. 
M i scellaneous  Professional  Associations 
A  number  of  professional  groups  have  developed  some 

■ 

ethical  guidelines  for  practice.     It  is  likely  that  members 
'  (  . 

of  some  of  these  groups  would  have  contact  with  students  in 

some  sort  of  formal  or  informal  assessment  role.  For 

example,   the  American  Psychiatric  Association   (APA)   is  one  of 

the  oldest  professional  societies  and  each  member  is  bound  by 

the  ethical  code  ot  the  medical  profession  as  defined  by  the 

Principles  of  Medical  Ethics  of  theAmerican  Medical 

Association.  | 

■Another  group  that  has  developed  a  code  of  ethics  is, the 
National  Association  of  Social  Workers    (NASW) .     This  group, 
established  in  1955,   has  over  89,000  members,  many  of  which 
work   in  schools  or  with  school-age  children.     The. NASW  code 
contains  a  preamble  and  six  major  sections.     These  sections  . 
address  standards  of  personal  and  professional  conduct  and 
responsibilities  to  clients,  colleagues,   employees,  and 

society  at  largo. 

Both  the  APA  and  '.maSW  codes  are  discussed  in  more  detail 
in  The  AAA!!  Prof osV.ional  Ethics  Project   (Chalk,   et  al.,. 
1930-).  ' 
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Summary  and  Conclusions 
In  this  chapter  we  have  provided  ?n  overview  of 
professional  groups  and  o'rganj-^tions  that  have  developed 
some  statement  or  code  relevant  for  asstE.-^.nent  or  treatment 
practices.     It  is  clear  that  \h^  various  professional  groups 
differ  widely  on  positions  regarding  tes '.i ng/asssesment 
practices  as  well  as  the  guidelines  developed  therefrom.  In 
some  cases,  statements  of  policy  from  one  group  (e.g.,  APA) 
have  been  directly  criticized  by  another   (ABP).     Positions  on 
both  sides  of  the  coin  are  often  not  based  on  empirical  data- 
Jensen   (1930)   has  even  identified  an  "anti-test  syndrome" 
with  several  features: 

1.  Most  critics  of  tests  .are  indiscriminate  , in  their, 
criticisms. 

2.  To  most  test  crit.  :s  there  is  a  mystique  about  the  word 
intelligence  and  a  humanir'       conviction  that  the  most 
important  human     attribute:    .   .mot  be  measured  or  dealt 
with  quantitatively  or  even  understood  in  any 
scientifically  meaningful  sense. 

3.  -Critics  give  no  empirical  basis  for  their  criticising  of 

tests,   test  itemS/   or  the  uses  of  tests. 

4.  Critics  fail  to  suggest  alternatives  to  tests--or  ways  of 
improving  mental  measurement-or  to  come  to  grips  with 
the^problems  of  educational  and  personnel  selection  or 
the  diagnosis  of  problems  in  srchool  learning^. 
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5.  Critics  hardly  ever  mention  the  nonverbal  and 
nonscholasl  ic  typeiJ  o£  mental  tests.     They  inculcate  the 
notion  thai    all  intelligence  tests  simply  tap  word 
knowledge,   bookish  information^  and  use  of  •'good  '  ♦ 

English."  ' 

6.  Finally,   criticisms  are  imbued  with  a  sense  of  outrage 
at  purported  so??lal  injustices  either  caused  o 
reinforced  by  tests. 

Whether  or  not  professional  organizations  will  be  able  to 
adopt  an  empi^rical  perspective  on  t;ests  in  the  future  remains 
to  bo  determined. 

A  final   issue  concerns  the  relationship  between  various 
professional  organizations  and  the  Supreme  Court.  Although 
many  professional  organizations  have  been  active  to  influence 
assessment  practices,   large  and  far-reaching  change  in 
psychological  and  educational  assessment  practices  with 
minority  chi Id ren ^ in  educa ti onal  settings  did  not  occur  until 
im;  ^iJS  was  provided  by  legislative  and  judicial  sectors 
(e.g.,   PL  94-142).     On  the  other  handr  some  authors  have 
noted  that  the  Court  appears  to  be  movliig  away  from  reliance 
upon  and  deference  t:.    federal  agency  guidelines  and  toward 
reliance  upon  professional  standards   (e.g..  Standards  for 
Educational  and  L^sychologi ca  1  Test--.,   1974).     Learner    (1978  ) 
raised  this  issu.?  .^t  a  time  v/hen  courts  wore  getting  invblved 
in  decidi;      -ii    .v^^es  relevant  to  m^inority  group  assessment. 
Yet,  as  noted  in  Chapter  8  ,   Judge  Grady  considered  expert 
testimony, ^ but  decided,   on  an  individual  basis,  whether  items 


A.sijesr.mcnt  IJiaii 

in  the  Wise,  WPPSI,  WISC-H,   and  t^tanfoit)  Uinot  wero 
culturally  biased  against  black  childr<    .     Thus,  at  this  time 
it  is  not  at  all  clear  whether  or  not  c,  .rts  in  general  or 
the  Supreme  Court  will  rely  more  on  professional  associations 
in  rendering  decisioris  on  testing/assessment  issues. 
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Ch.iptor  10  ^/J^ 
Cu rronL  St.at.us 

r  :.V(li(M  (1)'.  i  c'.i  1   :\\h\  tulucMl  I  oil.)  1   as  j;  c^s  juikmi  I:  pr.'icl  iccs  cDnlinun  to   i  ncrcn  so 
ill  p  «pi  1 .11  i  (  V  .      hi  publLc:  schnol.s  alc)ne,    it*  if;  osL  imarcfl   thai  aboul  oiu; 
cjuaiUT  »M    .1  nil  lion  staiulard  i./'.ccl   liistr.  iwc  adiu  iiii    lorod  annually,  many  for 
sprc- ia  I  VduiMl  Ion  cUh:  is  lonnnak  i  n^  (Ysseldyko  ^  Al^.o/./J  nc  ,   1982).     In  mnkinj!; 
dncisions  abonl    special   luluc a t. i o.n  c las s  1  f ic a t ion  and   placomont:,   lust  data 
.jr.-    .It. Ml  CO  I  1  (»(:  Lcnl  vn'CliouL  clear  purpose*  and   in  ways  not  intended  by  Ihoir 
d..V(lopris  (Vi.M-Myko,     i\\.y,u::y.\n(}  h  Thnrlnv^^    1980).     This  ijroblcru  becomes 
,:,n)-c  .•i(  n(o  v;1m  n  cuncorns  lui    '>7n.;  In  tho  ju-cuicss  arb  adilrosscd  .     Tlic  litera- 
ture   on. bias  lias  bui /'.ooned   in  tbo  last  decade  v/ith  divc?rgent  research 
cf  Torts  e>:ai:n  ni  n^^  various  aspects  oi    the  probion.     l/iiile  some  defi  n  te 
provrr in  our  understanding,  of  bias  in  psycho  loj',  i  c  a  1  and  educational 
as^;^SM.ll  nl  reported   in  soiiie  art»as,  proKress   in  lUluu-  ar(ias  has  been  slow. 
As  a  c';us"{]U'Mici',    thert*   is  much  eenrusian  in  i)ractic(>  re;/..Mrdin^',  neth.^d;; 
to  redt.ice  or  clii:ilnate  bias. 

:!  t    VMS   the  p'.irpose  of   this  project   to  review  the  var  ions  >.spec  ts  of 
th.    j)robl"i:    .f  bias   in  assess:). ^nt  and   report  our  f  ind  In^  s  wi  tb  in  a  concep- 
•'u..  1    r  r  ai.  ov.'or !:   t.  bat  i,)r;)vides  or^an  I  x.a  t  ion  to  the  literature.     In   tliis  last: 
chaot  t  ]-     -ve  t  xaiiiir.e   the  various  parts  as   they  contribute  to   tix'  v/hol(!  and 
i.ak'-  Jr..  o;  irund  a  t  i  on  s   for  continued  research.     In  addition,   v;c^  report  on 
tlw   i:;pli.eatic)ns  of  \/hit  v;e  now  knov;   about  bias   lor   special  eciucation 
die:  i  s  i  •.)i:-rMl"  J  n:-   ard  discuss  f'.u  i  d  e  1  i  nc»  s   tbat  can  bo  vM:))ioyed  :..;i'/en  our 
jM"  I  •  .'uu:  I   u  I  k!  v'r   t  :i  Tilt  i  u;,  s  . 

Del  i  n  I  t:  loii  of   b  j  r.  s  ' 
As  a   re.sul  t  or   our  rc^vic  v/,    it   is  apparent    that       eonsens!K-;  'U'liaition 
o  •    I)  i  a  s  n\' ' 'd  s   to  b  (*  adopt  ed  .     'I'll  is  ..b  ■  f  i  n  i  t  i  o  ti  s  I :  <  hi  ]  r  1   b  e  b  r  '  > .  td   e  1 1  o\ ; . .  l  o 
"encompass  all   legi  filiate  perspectives  yo  t  restricted   in  the  sense   tliat  Li. 
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makc'S  no  priMuaturc^  assumptioas  Ll\af  can  o  tliorwi      -bo  empirically      '  ^ 
invc.'st:ir,al:f:^l  •     "^^  arid  i  tion ,   tin:  dcffLniLion  should  aUuvvf  for   the  dcitcr- 
mination  of  bias  on  'empirical  grounds.  '  Given  Lhoso  parameters,  our 
proposed  (lelinition  of  -bias  cmcIucIgs   the  notion  that  there  should  be 
an  a  ])riori  assumption  that  mean  scores^   or  distributions  of  scores 
sliould  be  similar  across  p,rou.ps  .    .This  assumpt^ion  denies  the  possibility - 
tliat  diCftirences  across  ^^roups  are  rea  1  d  i  f  f  er'ences  .     A  denial  of  this 
possibi.lity  may  result  in  time  and  money  being  invested  in  the  develop- 
ment of   tests  and  procedures   that  have  less  utility  than  those  currently 
ein.pli^yc^d  .  ' 

The  requirement   that   the  definition  allows  for   the  determination 
of  bias  on  eiiipirical  grouads  excludes   thos'e  considerations   that  require 
value   judgneiits  on  the  part  of   the  dec'i  s  ion-riaker  ( s )  .     The  "social  pood'' 
or  "social   evil"   that  results  from  the  process,  while  essential  to 
consider,   are  presently  conceived  as  issues  of  fairness  and  not  bias, 
Sucli  a  distinction  allows  developers  of   tests  and  assessment  strategies 
to  employ  common  critcri^  to  examine  bias.     It  is   the  responsibility  of 
those  v;h()  employ  tests   to  determine  fairness  of  'the  strategies  in  the 
situation  they  intend     to  use  them.     It  is  .the  r  espon  s  ibil  i  ty  of  test 
developers  to  makf-   the  distinction  between  bias  and  fairness  clear,  and 
oCfer  guidrlines   that  can  be  used  by  tes t' consumer s   in  evaluating  whether 
or  not   the  instrument  is  being  employed  in  a  manner  that  is  fair  from 
tin-  consumers  point  of  vigv;.  .  ■ 

'     Tliis   lack  of  a  clear  distinction  in  the  past  has  resulted   m  no 


one  willing  to  accept  responsibility  for  questionable  practices.  ,  Tesy 
developt-rs  have  ar^uied  ,that   their  responsibility  is   to  dtivelop  valid 
and  reliable  t^ests  v;h=ile  test  consumer^^  have  argued  that  they  are  only 
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(Muployinj'.  tests  as  closcribnd  by  tt.'St  clovcdopc^rs  .     This  situntidn  can  be? 

nvoidcd  by  clearly  articulnting  the  distinction  bo  twocn  the?.- concepts  oL  bias 

and  fairness  and  identifying  who  is  rosponsibile  for  each.  National 

Associations  can  help  by  developing  ^^uidelines   for  its  members. in  proper 

test  d.wolopncnt  and  use.     Such  guidelines  would  als.p  prove*  valuable  to 

the  courts  in  examining  issues  of  bias  and  fairness.  "  • 

In  keei^ing  wtth  the  tnajority  opinion  as  we  believe  it  to  be,  bias 

is  prc^sently*  defined   through  the  use  of  the  concept  of  validity.     It    '  . 

maintains   the  notions  of  bios  expressed  in  the   technical,   test  bias  and 

sLluational   bias   liter-atures  and  expands   them  to  include  our  conceptual-  : 

i'/,ation  of  outcome  .bia'S.     From  this  perspective,   bias  is  present  itr 
\ 

1)   there  are  d i f f erenccs  across  groups  as  commonly  studied  within  the 
context  of  content,   cons-tnict,   and  predictive  validity,  2)  situations  and 
circumstances  in  v;hich  the  assessment  strategy  is  employed  results  in 
d  rrr<-renc(!S   in  the  maximal  performance  across  gYoups,   and  3)  their  use 
results  in  difference  across  groups   in   the  effectiveness  of  outcomes 
predicted  from  their  use.     Note  thdt -this   latter  statement  expands  the     *  ^ 
study  of  bias   to  all  'data  employc^d   in  decision-making  and  not  just  test 
data.  ,        ,  ■  ,  -5  ■ 

Jc^nsen   (1980)  rc^ports  that  bias  "refers  to  'systematic  errors  in  the 
prediclivc^  validity  or  tho  construct  validity  of  test   scores, of  indijiiduals 
that:  are  associated  w*'th   the  individual's  group  membership"   (p.  375). 
To  Lhie:  (h»finil:ion  we   add   the  notion  of  outcome  bias,   exncMKl   it   to  include 
all   forms  of  assessment  Uata^   renamo  the  traditronai  concepts  of  predictive 
and  construct  validity,   external  and  internal  r.unsti'uet  bia's,   res  pecrnvt;dy ,  an( 
in  our  und(»rstanding  of  internal  construct  bias  the  notion^of  'situational 

'         .  482  .  .■        ■  •  •' 
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.i()|j^b^^        \'hc.  rcfuiltinj;  definStions  rends;.     \l  i.  a  s  r  c  f  n  r  s  to      »^  t:  onia  f  i  c: 
I M-  r  o  r  s   in  ^  c  1  u »  intern  in  1.  n  ru  1  o>:  t:  e  rnnl  construe  t  va  I  id  i  Ls'>  ^or  _t. oiU:  cono 
V  a  H  d  i  I:  y  ( ^  C   L  ( » gts  and  a  s  s  e  s  sm\'nt:  st  ra  to^ylo  s   LhaL  arc  a.ssociatcd  v/i  Lh 
i  ndividual  Vs  group  in(M;>ber/;li ip  .  "  t\  .'• 

Rcsoarch  Noed.s  ,  '      •  , 

As  rcpor  Led  in  ChapLcirs         Lhoro'  jlias  boon  subsLantial  progress 
r.iade  in.  our  sLudy  of  bias  from  a   Lr  act  i  t:  iona  1  psychometric  perspective. 
Tills   litc^raturo  suj;gests  that  life  tie  to  no  evidence  of  bias  is  found'"^ 
in  eoninonly  employed  measures  cogni  tive'^ functioning .     Research  in  the 
area  of   internal  Construct  bias  has  shown   that   these   tests  appe^ir   to  he 
measuring   thu  same?  construct  across  groups  with  a  higli  d  eg  ret:  of  accurae 
III  ndditiV)n,    there  does  not  a^^i^ear  to  be  any"  particular  types  of  item.  . 
that,  results  in  systom.atic  error  across  groups  although  more  research, 
in  this  area  utilix.ing  latent   trait  statistics   is  needed   (Cole;  1981). 
Yc!t,   even  if  biasing  items  are  found,    their  elimination  is  not  l-ikely 
to  si  i^ni  f  icantly  contribute  to  a^  decrease  in  the  diff<^rences  now  found 
in  performance  across  groups^ 

Research  inv.cis  tigating  'group  d  if  ferenccs  in  performance  on  items 
or  clusters  of  items  provides  information  on  whether  or  not  the  same 
constructs  are  being  measured  " for  all.     It  does  not   tell  us  to  what  ~ 
ciegrec   the  construct   is  being  mcvi-'^ured  acrosj*  groups .     In  Chapter  3,  v;e 
examined  some  of»  the  situational  factors   that  have  been  inveistiga  ted. 
to  det(^rmine  if  d  if  f  (?reiic  es  across  groups  are  related   to  differences 
in   the  degree  to* which   the  construct  is  measured.     Research  in  this 
area  has  derpons  tf  a  ted  that  test  scores  can  bo^mani  pu  la  ted  by  vat'ying 
situational  factors.     Conseq^jent  ly ,   if  dotiS  not  appear   that  tests  ol* 
•cognitive  functioning  are  u\easuring  maximal  performance.    «The  issue  of 
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v;hc(hM'  or  not.  Lhoro  aro  tlif  XtT^Micc^s  in  Liu?  cloj'p'cM?  l:c)^  w!i  i.cli  these 

t^i  lua  I:  i  onn'l    rjjct;ors  cjan  influence  pcM-f  oriiiancc*  cicrosH  ?;roups  is  not  ^ 

yet  clear.     Mori:  research  is  needed   Lo  deLerinine   the  d  if  f  er  en  t:  ia  1  ^e  f  f  oc  t: 

of  s  !.t:na  lional  factors  accoss  groups,  •    "  .  • 

0  In  thf    stU'Ty  of  external  conslrnct  bi^is,   psycholoyiical  tests 

coiiirjonl  y  oyed   in  de  ci  s  ion~niak  ing  have  been  .^hown  not   to  differ  in 

their  prediction  of  external  criteria  or  pverprcdict  the  perfornianco 

of  niijior  i  ty '^group  niembers  when  a  noninino^ity  or  common  regression  line, 

is  (employed,     V/hile  these  findings  have  been  repl  ica  ted  across  a  variety 

oyf  tests 'the  ex  terna  1,  cr  i  ter  ia  itha  t  has  been  employed   in  externally 

validating'   i  n  t  ol  1  ij?  (ince  testa  has  bnen  criticized.'    These  studies  have* 

coninonly  employed  s  tandard  i/- ed  "njeasures  of  achievement,  measure  s  that 

are  considered  by  some  to  be  measuring  the  same  thing  as  intelligence 

tesiis.     Sii\ce  learning  is   the  major  criteria  to  whicji   intellig^juce  tests 

are  suppose   to  predict,   research  is  needed   that  employs' var ious  criteria 

of  learning  to  ex  torha  lly\/a  1  ida  te  tliese   tests  and  .study  bias. 

As  reported  in  Chapter  6,    the  area  in  mbst  need  of  research  is 

ontco!'ie  bias.     Much  of   the  research  efforts   to  date  have  focused  oi^  bias 

in  measuring  constructs,  not  in  tin?  use  of  tests   in  predicting  outcomes. 

This  appe  jrs   to  be  'a   function  of  the  v/ay  in  wh*ich  validity  has  been      *  I, 

defined.     Indeed,   not  on»ly  is  t.herc   little  research  with  respect   to  * 

outcome  bias,    tiie  research  evidence  regarding;  outcome  validity,  especially 

,-is  it  relates   to  intervention  olanning,  has  been  limited.     Research  efforts 
'  *  , 

sp(,'C'ific   to  selection  decisions  have,   for   the  most  part,   been  liinited 

'  i 

to  s'el(?ction  in  iMPployir.ent .     The  Cev/  research  efforts  that  have  {;tudiccl;»^ 

'  •  ■    ■  '  '  /  ■ 

validity  in  selecting  children  in  need  of  •special  l^elpin  schools  have  been 
»     •.       •  •  -  • 
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hampered  by  me  Llu)cli)lt.,,  i  cii  1   pi^jblcms.     While  iL  appears   thai  the  pi^rd  i  c:  L  i 
vitililv  ol'   IQ  Li';;l      is  reducc^d  whcMi  measures  of  sch  o  L  achiovcMiu^n  L  .if) 
i)pi>of;('(l   Lo  acadoinic  nch  ibv  .miumi  L*  art^  (Miip  loyod  ,    Liu;  research^ 'is  oquivoca) 
wi.Ch  ri?spi'cL   Lo  bias  in  predicting  school  achievc^mcnt ,     Tliis  would  be  a 
f  i'nitfii!  virca   lo  re.^is'irch  . 

With  r('f>pec:t   l.o  the  validity  of  tests   for  intervention  planning  and 
consecpicnt  bias  as  a  result  of  this  planning,  much  research  is  still 

ne(Hh«d.  "*As  described   in  Chapter  6,   those  assessment  strategies  that 

^  •  '  * 

directly  measure   the  behr^viars  of  concern  and  are  employed  continuously, 
.llircnij'.houl   tlie  interve'ntiun  have  been  the  only  type  to  offer  empirical 
evidiMiee  of  outcome  validity.     No  reseaVch  reporting  on  intervention  bias  ' 

i  \ 

v/as   foi'nid .  •  1  •  . 

(A-rtainly*,   the  notion  of  outcome  validity  is   the  most  controversial 
,     /  -  •> 

aspi'ct  of  our  ^definition  of  bias.     There  are' those  who  maintain  that  the 

purpose  of   tests^are  to  measure  con-^t ructs  and   the  validity  and  conSeque^it 

'bias  of  tests  needs   to  be  studied  separate  from  the  specific  use  of  the 

te?Jt.    *The  argument  con t inue s  that   if   tests  are  employed  inappropriately 

and,   for  example,  children  are  selected  inappropriately  or  intervent ionS 

planned  that  ai>e  inef  fere  live  or  biased, 'then  the  fault  is  not  with  the  test 

Its  job  is  to  "measure  the  construct  and  should' be  judged  on  its  ability 

do  so.^kWliile  this  argument  has   its  appeal.,  we  see  certain  problems  with  it 

^  -      .  . 

Basically,  v;e  qu-est'ion  the  employment  of  constructs  unless  there  is 

evidence   that  their  use  will  help  in  decision-making'.     Consequently,  we 

have  argued   that  evidence  for  their  u.se  should  be  part  of  the  concepts 

of  validity  and  bias.  .  _ 
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IJro.ulr.iuMl   concc'i/ti  on..;    o(  v.i  1  i  d  i  Ly  ..■md  bi;is  r(.nnot.o  (.h.-.t  l.lu; 

f 

iMc-isiirrnunil:  of  i:hc  .c-oiifi1:nic  t  ii:  not  .m  (mk'i  .in  i.  tsc  1  fy  bo  f  onl  y  <•. 
uviiur.  to  lui  i-nd.     Indeed,   the  sole-  roafUMi  Cor  InvcnlinK  con|i  true  L  s 
is  to  serve  Tome  purpose.     A  broadened  notion  of  vn 7  i d  ^ '.y ,wou1  d  require 
Lliat  i,tsV'n>OHf^        empirical  ly  established.     Consnr  :>rs  now  turn  to 
psyelu.l(.;;icaJ  r.ieaV.ures ,   evaluate  whether  or  not    l-bcy   measure^  the_ 
cnnstrMcfoC   interest,   and'then   infer   that   its  use   is  of  value   to  thcMu. 
■ihc  knowled;;al)le  consumer  appreciates   the  inferences  that  are  made  ant. 
uses  caution.     We  would  prefer  that  they  use  empirical  evidence ' instead 

  >  * 

Sueh  evidence,  we  believe,   can  best,  be  j;cncratcd  if  we  change  our 

/■[ 

iindcrstandinr,  of  validity  to  incorporate  oCftcome  valdity.  ; 

As  discussc'd  in  the  beginning  of  this  chaq^ter,  a  d i  fferei-itia tion 
needs  Lo  be  made  between  bias  "and  fairness.     Tliis  is  by  no  means  an 
attempt   Lo  dow(iplay  the  importance  of  fairness.     Ultimatc-ly,  all 
decisions  regarciiW  the  employment  of  psychological  aijd  educational 
assessment  strategies  rests  on  the  decision-maker's  beliefs  of  whether 
or  not  the  whole  process  is  consis't^l!  with  their  values  ^nnd/or  the 
values  of   those  who  employ  them,     ^t  present,   the  judgment  ■  cgarding 
whether  a'  dec  ision-making  process  i's  fair  or' not  is  an  t .  :.y.3temat].c 
e.::(..rcise,   if  an  exercise  at  all.     Guidelines  are  ncedeu   that  can  make 
this  exercise  systematic  by  having  decision-makers  think  through  whethc 
or  not   the  product  'of   their  "efforts  is  u 1 1 ima te ly  ia ir  to  all.  . 

•     The  models  .of  selection  reviewed  in  Chapter  6. were  designed  to  t 
reflect   the 'various  phidosophies  'of  fairness.     What  is  needed  next  is 
•a  clear'  exposition  of  the  models  so  that,  they  ca    'be  under  s^toodafid 

■    .  4 

...  \  ■       ■•  ; 
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(';:i))li)y('<l  by  i:i)nsHtiu'rs .     V/c  rcrntiuiuind    Lli.-lt    l\\v  y^y^}^yySy}^^}}}:}iy}}  ^'"Ky^':^^y}. 
1)1-  I'll)' in(HM-<nl   for  consiMiitu-  ifst»  siuci-   it  c.^n  or  cmploytul   rc^-.fird  It'ss  i>l 
v.'h.'il    r.-iinn'ss  phi  1  t)Si)|}liy  is  a(li>pl<'d  by  dc^cision  mak(M'f;.    11    would  pio- 
vidi'*  ;i  vclnclo  for  iiinkinj-  whal(»vt»r  ad  ju;j  tincn  L  s  to  (Kic  i.si  (jn.s  ar(»  juicc!;  .sary . 

•     WitlV  ^^'^  various  proposed  aUurnati.ves  •  to  prirscMU  practice 

.;,.'Vi'ial    sci'i!)  |)roin'i  r.  i  ni;   Tor   rurlluM*   i  nvf  st:  i  |',a  I  i  on  and   impl  (mikwi  la  t- 1  on  . 
lif-hayioral  Assi-ssiurnt  and  c;  •  i  lea*  i  pn-RcfcriMiccd  T(?stiniv  «ipp^^ii*  to  bo  the 
iifost  bi^'bly  d('V(»lopod  procoduros^  that  can  have  an  inmicdiatc  iinpact;on 
plannin,',  i  nlcMVAMitiuns  to  help  children  develop  skills  in  wllich  they 
are  di' f  i  c:  i  f -n  t .     Str/'te^it^^  si^ch  .-^s  D  iaj.^uos  t  ic /Cl  inica  1  Teach  in;^  and 
Chi  Id  J)i'V^.'l()pi:u'nt  Observation  stMl.netMl   to  be*  researched   to  validate  , 
'  I  lie  i  r  u sc  fu  I no.^ s  . 

♦  * 

For  diar.nos 'j-c^  purposes ,  Learning  J'otcntial  Assessment  nay  hold 

])roi:iise   in  its  ability  to  measure  learning  as   it  occur.-,  in  assessnent 
rallujr   than  as  a  static,,   after  the  fact  occurancc.     While  this  may 
reduce   t:lic?  potential  impact  of  biasing;  historical   factors,    this   is  an 
cM;ij)irical  question  that  has  yet  to  be  addressed'.     However,  the  most 
active  proponents  of  this  iliethod,   Fuorstein  and  his  colleagues,  have  ^ 
i:u)\(^d  into  using  learning  potential  assessmen;:  as-  a  method  of  identifying 
skiTls  for  intervention.     Much j^t^search  in  their  procedures,  identified 
as   Instrumental   Enrichment,   needs  to  be  completed  before  the  validity  of^ 

its  use.  for  inUervcn(*-ion  planning  i  ;r  c  s  tabl  ished  .  ,  ^  ^ 

'  ^*       .  ■     *  ■ 

"^The  vahK?  of  "these  strategies  that  an*  p^urpor  ted '.to  improve  upon 

the  diagnostic  and  j-jred  ic  ti.ve  ca^pability  of  IQ  tests  show  1  i  t  tie  ey  i  dcQce 

to  support   their  continued  development  at   thi-s-time.-    Kenorming  to 

* 

.  cnri-GCt  for  (-mpir  icnl'ly      tnbl  ished  bias"  is  a  lQsit:iiyal:e  activity  but  to 

■  '   '      •     .   .  •  ■  .  '487  ;      ^  •' 
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rcuonii  .'J  Lest  'oil  l\w      priori  n.ssniui)  I  i  on  lliaLiiuN'in  ?u'.or(Ms  ncross  j^roups 
should  iif  siii.rj.'ir  ;ipi)(V'li-^J  (HH'S  t  i  otiab       from  our  pcM-rpcc  L  ivo  .  Dcvclopinr. 
cu.l  lure  rciii/cc;d     test?;  also  appoarfi  unwar  lanKid  .     CulLuraMy  Loaded  .  i  Luns  , 
if  any  arc  found   to  bias  nvspon.sd.s  across  ^;ra\ips,  ,(:an  bo  (?liininal:od  from 
lir(!Sont   LosLs.     l)(!Vi!  1  opinj;  nnw  tests  with  all  i toms-^cu U uro-rcduc cd  docs 
not  app(>ar   to  be  \Ji  v  ih  tho  time,  or'oxponso.     In  .sliort,   devolopinj;  now 
^      injUrunionts   for  diai;nosin^  and  c  1  ass  i  fica  ting  appoafs  to  be  misdirecting 
our  oCforts  by  ovc^rcuiphasi/.i.iij;  itr.  importance  in  helping  children.  ^ 
Cu  i  dc'  1  inas   for  KtHhicJ  n^;  Uias  in  Special   Education  Docisxon-Makin?^ 
Our  understanding  of  bias  and  what  to  do  about  it  continues  to  develop 
;is  we   investigate   it,    M;hat   is  .i,t   that  can  be/done  at  present   to  help  '  • 
iinj^leiient   what  we  know  and  what  cautions  can  be  oTercised   ia  areas  that 
.     are  still  under  investigation?     Tucker  (1980)  recommends  a  scricip  of  steps 
»     »that  can         taken  with  r^^^^ard  to  special  education  decision  making;  that 
address   this  question.     "The  Vist  specifics  nineteen  points  in  the  ' 
appraisal   process  at  which  asscssiLmt  data  is   (or  shovild  be),collcctcd 
'  and  used   in  evaluating  a  student's  progrAm.from  a  non-biased  perspective" 
(Tucker,   1980,   p. 3).     The  steps  are  based  on  a  series  of  questions  that  , 
decision-makeAs  ask  tiiem.selves   throughout  the  process   that  leads  up  to 
tlu:  classrf-ication  and  placement  of  children  in  special  educa  tioru  c  lasses . 
•   The  nineteen  questions  are  listed  below: 

(1)     Is   th(M-c  a  significant  probl  em  involvinp,  this  student? 
(>)   'Is  the  prc^blem  worth   taking  time  to  pursue?' 
A  (*0  Does  the  initial  observational  data  col^lectec^  on  a  day 
to  day  basis  sur.ge.st  that  a  signil  irant  problem  exists? 


^A)  IKmm;  Mil'   iiilcMiiialinn  j-n  i  n(»(l  from  Ituj  parent  or 

"  ^ 

),u.n-<li,m  suKi'csl    Llu'  nerd  for  .•nCrrnnl.ivc  clns.-.room 


1 

i  nl  i-rvcnL  i  on?  . 


(•)yDo  Lhr  oI)SiM'vaLion;0  fl'Hn  fj'om  Slap  A  show  Llicil  the 
prol)hTi  bch.-ivi^   pnrsists  cvc-n  w1k-i1  aUcrnativc 

(■  lassrooiii' !;l.i  ati:f;i.i-.''  imp  1  ciufnti-cl ? 

I  '  ■  • 

(G)M)or.s  the  .scr(;c.nin>'..,fl<-''--i  su>'r.i-J;t  the  necul   for  otli(.>r 

V     '  ,        ■       '  i 

ti;rnntive_^c;dficati  onal  services? 

(7)  'Doc.s  the.problcm  pcr.iist  even  whan  alternative  regular 
*"eHuc.Tlion^iU*f-rnative.s  (.sie)  are  provided? 

(8)  Ilav..-  all  steps  1  through  7  been  tak.'n  and^  is  all  of 

tlK!  resultin[;  data  on  'hand? 

•    '  •  • 

(9)  Have  all  the  necessary  questions  been  generated  to 

provide  an  adequate  biasis  for  planning  the  student's 
educational  program? 

(10)  After  the  assessment  performed,  in  Ste.p  9,   is  there 
sufficient  evidence  that  the  student  is  handicapped? 

(11)  Does   the  assessment  'kjata  <jbtained  in  Btep  10  supply  > 
•.sufficient  evidence  tha  t'  the  .  student '  s  problem  is  - 

educafionaUy  related   to  and  .suppor  ted  by- a  handi- 
capping condition^ 

(12)  Have  all   the  assessment  questions  bet  a  answered  to 
the  satisfaction  of  the  Mu 1 tid isc ip>inary  Team? 

(n')  Is  the  Assessment  Rep  jargon  free  and  understandable 
in  that  it  communi.  -tres  in  simple,  s  tra  igtforward  ter-ns 
to' all  *^ho  wiil  be  present  at  the  1  .E.P meeting,? 

(14)'  Does   the  student  appear  to  rteed  special  education?. 


(IS)   Ij;   \Ur  sluch'nl  a  iiuMnbtM*  of  .'i  minor  ily  i\v.n\\)  or  oLIum* 

un  i  cpio  popu  la  t.  i  ou? 
(ir>)  Arr       iV,il)ilit:y  tlLK^lsions  Iroe  of  on  1  l.ur;i  1 -bfas? 

(17)  ll^ivt'  all  uc.CiMiWiwy  procautions  Ixicn  lakmi  lo  in.i;urr 
thaL  lhc  student's  educa tionc-il  lujcds  can  best  be  met  by 
tin-  provision  of  spcci.'il  odiic.iti.on  'Jiii'vi  cos 

(18)  Have  Ihv.  parinits  npprovcul  the  Htudcnt'.s  plncwncnt  and  • 
the  program  as  $ipcc  i  C  icd  ■  in  the  I.E. P.? 

•  (19)   Can  wc  tell  if  the  student's  progress  is  sat isf ac tqry?  ^ 

Mlnr  each, of  the  questions,  Tnckor  provides  discission  concerning  how  to 

proceed  depeiuling  .on  .whether  the  answer  is  "yes"  or  "no". 

As  can  be  j^cn  from  the  questionH,  Tucker  has  basically  provided 

gnid. .lines   Cor  conducting  a   thoroi.gh  asse^iJment  consistent  with  the 

requirenents'embodied  in  I'.L.   94-142.     Though  issues  specifically  related 
V  I 


to  minority  group  assessment  are  not  addressed  in  ever^  step,  lb  is  implied 
Ut.nt  if  decision-makers  are  .thorough  in  their  method,  potential  bias  can  be 
reduced.     For  example,  question  3.may  redlice  biased   impressions  by  requiring 
that  .-^problem  be  documented  by  observa  tional  da  ta .     Likewise,  question  5 
may  r(duce  bias  'in  labeling  and  placemen't  by  employing  alternative  classroom 
strategies   in  an  attempt  to  r<?medy  the  problem  in  a  les^  intrusive  j^ay. --V* 

An  -interesting  recommendation  of  Tucker^is  that  the  multidisciplinary  ' 
l<..,u  include  a  member  that  is  sensitive-  to  . the  student ' s  racial  or  eultural 
group.  Such  a  ste,p  may  help  in  identifying  potential  biases  in  the  team 
Lhat  occur  out  of  ignorance  of  what  is'  "expec ted^  of  a  child  from  ascertain 
racial  or  cultural  group.  mtimately,  the  validity  of  all  such  i,;iprcssions 
need  to.  be  empirically  documented-  and  the,  potcptial   bias  examine.d.  ^ 
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\\\  })i*^\>i\vu^\  lor  'i  Tornial  rv/i  1  ii  ;i  I,  i.nn  oi   inInoi;ity  cli  i  1  drcn  ,  Tiu-kcr 
rccDiiiimMKls  Ih.'it   t  (s'lmj;  Unvr   i  n  lOniK'!  t.  i  on  on  lani;u;i|\(!  prof  i  c  i  (^lu:  y  and  diMCr- 

brtv/ocn  .'xpri'.-.sivf  and  rrccptivi*  1  anr.u'a{'.i' .     Thri\r  r*'<: oimiuMula I  i on s 
nrr  cun;;  i  s  I  rn  t   with  IM,.   yA-lA:^  j-n  i  ilc  1  i  ncs  and  arc  Just  i  fi  abl  (?  bafU'd 
(lie  <.i;i|.irical    1  i  Lcr.-i  tAirc- .     Tlic  l.-inr,(ini;os  of  tin;  cliild  nnrl  tlu'  cix;iiii  i  ncr 
do  i.alu^  a  diffon-nci'  in  tbo  asHCssiifbnl:  o£  bilin|;ual   children.     Wh  i  hi  Uh» 
short -ianj;c  .i>rtHli.c'Liv(>  validity  of  IQ  IciJts   is' af;  y,ooi\  !  or  bilinr.ual 
stuU.'iitj;  as   it.  is  for  Kj^Kli^di  sp(?akinj^  st.ndcMits,   this?  finding  itppcars  -to 
rosuil,  only  bocauso  iho  ciri  tcr  ion-mts'tsurc  al  i;o  requires  lin^Uish  proficiency. 
i:vi.cb'  nce  th.at.' there  is'a  slKnificant  difference  between  verbal  ^ind,  perfor- 
i.iaiier   IQ  r.u-asures  within  j;roups  as  a  function  of  lan^;uaj;e  proXiciency 
stron/,;y  suj'j'.ests  that  the  .use  of  tests  that  emphasix.e  verbal  abilities 
:irc-  biased-  a^',a  ins t  bilin.-ual   stt^diints.     Consequent  recot;n:um;lations  thai 
prrforiaance  measures  be  used  to  assdss   intellectual   functioning  of. 
bilin;Mial   stud(^nts  are  sound. 

•  .  *  * 

ConsistcMit  witli  tlic  law,  Tuckor   (1980)  nlso  rcco^inicnds  tliat  sub- 

stanrlard  pQrfnrnance  or^xtf  measure  of  adaptive  behavior  be  n  prerequisite 
to  further  evaluation  for  mental  retardation..    This  is  also  in  keeping' 
wit^h  current  definitions  of  mental  re^tardntion .     If  the  ide^ntified 
purpose  for  administering  an  JQ  test  is  to^ diagnose  retardation  and  the 
.diagnosis  caiv  only^e  accompanied  by  significantly  substandard  performance 
in  adaptivevbehavior,    then  IQ  testing  only  needs  to  be  performed  vith  those 
children  who  remain  eligiblp  after  the  administration  of  an  adaptive 
behavior  mf.asure.'   Of  course,   the  converse  of   this  statement  is  also 
true      That  is,   if   the  child's  performance;  on-an  IQ  test  does  noL  quality 
him/h/r^for  classification  then  there   is  no  need  to  administer  an 
'adaptive  behavior  measure.     It  would  seem  less  intrusive  to  the  child 


Asrit?S'.uiu:iit.  Hi  lis. 


As  irvirw.Ml    in  Cli.ipt        /,   (••-rtaiii  ada  pi  i  vr  l)i'lMvii)i'  Mcrair^-;  sud'  as 
{\u'  AlUC  and  Ali'l  -Id;  ,   liavr  u  d('r.i"<'**  <>  1'  cDiil^'nl   and  c  uns  t'nic  (  validity. 

i)r<'(l  i<'L  ivi'  '.and  ouU'ouic  valjditins  of  Llicju*   ins  t  rmm-iil  ?;  »  howi'vcr  »  h.'ivf 
A't't    to  l>f  'Vi  I  al>  1  !  slii'd  . 

Tin-   iiai)(»rtani*M  ol    I'.s  t  al>  1  '  fdii  nr.   I  lu'  validity  ol    t  lie  nr.r  of         U'iilr.  » 
adaj^tivr  Ixdiav  i  or  /iirasui'r:;  ,  or  both   in  llic  diardK):>is  and  j)  1  a  etymon  t  oi 
rionlAilly  lian(Iicai>p('d  cliihlrcm  bocouios  liir.li  1  ir>li  tod  wlicn  decisions  concprninr, 
pr()i)oy  t  iun/il   roi)r(>sontat  ion  nord  to  Ixf  nado.     I  f  i)ropor  t  ional   ropre sc^n t  a • 
i  s  (Irl^'n  ii  ni'd  lo  In'  fai  r  by  flu*  doc  ini  on  maki'rs,  l)ow  doos  one  best  choos 
111.'  proportions  of  cadi  yroup   to  bo  c  lass  i  f  i.(!d /p  1  accd  ?     As  discussed 
pr^»vioi^sl,>'1»  Lbrro  arc  nany  r.iodtO  s  from  whicli   to  clioose  in  making  tin  s 
(ice  is  1011.     liowcivor,   in  this  case,   all.  models       uld  require  tiiat  we  know 
v/hi(  I)  data  would  provide  the  best  prediction.     If,   for  example,  only  20% 
of  an  KMR  population  is  allov^ed   to  be'^lack  and  there  are  more  than  that 
perc(inta-(!   identified  and  placed  throui;h  the  use  of  10  tests  alone,  would 
it  be  a  tnore  valid  strategy  to  rank  the  chiJ-dren  to  be -chosen  on  IQ  alone 
or  rank  and  clioose  the  number  by  adopting  a  measure  ol  adaptive  behavior 
to  combincr^th  IQ  testing.     Indeed,  as  Fislier    (  1978)   reports,-  the  Cise 
of  the  ABIC  in  combination  with   IQ  tea:t»s  would  reduce  by  60,   70,  and  85/' 
the  nunber  of  Anr>lo,  Mexican-American,  and  black  children,  respectively, 
that  are  clas.sified  FMR  by  IQ  tests  aloni;..  Such  a  strate^>y  may  in  atid  'of. 
itself  solve  a  district  *s  problems.     The  n-swer,  of  course,  has  to  be\^  ^ 
based  on  what  you..are  predicting;  and  the  validities  of  each  procedure  in 
makin?;  tiiese  predictions.     With  respect  to  KMK  dia^^nosis,,  the  answer  has 
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already  boon  provided.-    By  definition,   EMR  diagnosis  now.  requires  both 


and  the  courts  concur.     Kt.»scbly   (19S2)   point^s-^lo  on.e   source  of  validity 

for  usin^;  both  IQ  and  adaptive  behavior  to  piedict  c  1  a'ssi  f  ic  a  t  ion  .  If 

both  are  useci,   the  prevalence  of  mental  retardation  in  the /schools  will 

drop  to  approximately  1   to  1.5%'.     This  figure  closely  approximates^  the 

pei'ccnl  a^;e  of  adults  ideritified  mentally  retarded   through  community 

SAirvoys   (Tarjan,   1970,   cited  in  Reschly,   1982).     \^^hether  or  not  the 

^  -  '    .    "I  ■ 

placement  that  always   follows  such  diangosis  hae  any  outcoiae  validity 

is  no  t  known .      .  . 

The  implementation  of  Tucker^s  ( 1980)* nineteetfi  steps  are  designed 
to  addrc^ss  issues  related   to  potential  bias  in  the  process.     He  provide 
no  c^uidelines  regarding  fairnesis  in  selecting  and  placing  students. 

Sattler  (1982)  reports  on  several  recommendations   that  have  been 
made  for  assessing  ethnic  minority  children.     Those   that  are  consistent 
v/ith  our  understanding  of   the  literature  are  reported  below. 

(1)  Assessme^nt  should  focus  on  discovering  ways   to  help 
children  and  not  on  ways   to  better  classify /place .  The 

increased  use  of  behavioral  and  criterion-referenced 

I  .  .  .    .         .  ■ 

{measures  designed  to  intervene  on  specific   skills  deficits 

should  be  the  focus  of  the  assessments  , 

(2)  .    Examiners  sh®uld  take   the  time  to  motivate  children  to 

perform  on  tests.  Seeking  the  cooperation  of  children  ^- 
would  lielp  reduce  problems  reported  m  situation  bias. 
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(3)  A  wide  range,  of  mental  tests  should  be  Msed 
wiicn  assessing  minority  children.    "tJiven  the 
complex  nature  of  intellectual  functioning,  . 
the  more  adequate  the  sample  of  skills  assessed, 
the  more  likely  one  is  to  gain  a  more  valid  and 
reliable  measure  for  any  one  child.  - 

(4)  Procedures  for  ^'testing  the  limits''  (Sattler,  1974,^ 
•  1982)  may 'p^^ovide  a  better  picture  of  the  maximal 

,performanQc  capabilities  of  a  c,hild  th-ai\ 
.  •    standafdij^ed' procedures  alone. 

(5)  Emphasize  or  use  exlus ively  Vonlanguage  performance 

measures  with  bil^ingyal  ch'ildrcn.     Using  bilingual 
examiners  who  can  allow  the  child   the  opportunity 
to  respond  in  the  language  they  prefer  is  desirable 

(6)  'clinicians  should  become  Hnowledgable  of  the 
cultural  and  racial  differences  among  children  in 
the  community  they  work. 

(7)  '  Teachers  should^ becqme  sensitive  to  the  educational 

difficulties  displayed  by  various  minority  group 
children,     behaviors   that  connote  one  type  of  . 
problem  for'.som'e  groups^  may  reflect  a  different 
problem  for  others. 
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Concluding  Remarks 
One  of  ihc  most-,  f undaiuontaL  needs  in  the  area  of  nonbiased  asscss- 
nt  that  h'as  arison  fron.  our  review  is  the  need  for  test  consumers  to 
'carefully  examine   the 'purposes  for  whicK  they  assess  and  test  developers 
to  broaden  their  concept  of  valiidity  and  consequent  study  of  bias  to 
pr.ovide  consumers  with  the  research  information  they  need'  to  salect  the 
best  strategies  to  fulfill   these  purposes.     With  respect  to  test 
consu,.tption,  when  one  asks   the  basic  questions  regarding  the  purposes 
of  special  education  decision-making,   one  ca^n't  help  but  wonder  why  there 
is  the  need  to  diagnose  at  all.     If  the  'ouroose  of  soecial  education  is  to 
provide  special  help  to  children  with  learning  problems  /  then  we  can't  . 
help  bit  ask  the  same  question  that  has  been  rei terated  over  the" last 
■     few  decades,  why  diagnose?     With  evidence  continuing  to  mount  regarding 
the  superfluous  nature  of  the  activity,   time  for  change  is  imminent. 
Such  chan.^c  i  s  now  being  witnessed  in  several.^fea fas  and;,  it  is  not" 
.radical   to  predict  that' educational  classf icatipn  as  it  is  now  conceived 
will  eventually  disappear .  ^Noncategorical  special  educa.tion  placement 
based  on  a  child's  educational  needs  rather  than  hi^lassification -will 

♦ 

hopefully  prove  to  be  the  next  major  change  in  providing"  help  to. children. 
Such  a  system  would  focus  attention  on  collecting  data  to  help . chi Idren 

rather  ihnn  diagnosing  them. 

Once  d..;cislon-nakers  'decide  on  the  purpose  of  ,cheir  assessment  activit 
.      UK.y  will  need  better  data  on  the  validity  and  unbiased .  na ture  of  ^tha t  data 
to  r.ake  the  decision  they  deem"  i.npor  tant .     Such  information  can  be^t  be 
■acco,-..;.],ish..d  by  expanding  our  understanding         the  concept  ^of  validity  to 
Include  outcome  validity.     This  would  then  allow  for  an  examina tion  of 

9 

selection  and  intcrv.ention  bias . 
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As  tost  consumers  evaluate   the  purposes  of  their  activities,  an 
examination  of   their  ovm  values   that  ultimately  impact  on   the  decisions 
they  make  need   to  bo  determined.     Such  an  examination  of  the  fairness 
of  'their  activities  would  openly  address • issues   that  are  often  only 


casually  addressed  or  ignored. 
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V  Footnotes 
''There  is,  however,  evidence^  that  use  of  projfjctive  techniques 
fias  becMi  declining  (Klopfcr  ?i  Tau'lbee,  1976;  ReynolVls  8  Sundber'g,  1976"). 
'Wc  ox[)rc-ss  a[)|)rGCialioii  to  Dr.  Sandy  Alpcr  for  her  assistance  in 

v/riting  sections  of  Chapter  3. 

3  '  *  ■  * 

In  both  the  Sandoval  (1979)  and  Oakland  and  Feigenbaum  ^1979)  studies, 

.the  digit  span  and  coding  subtest  v/ere  not  included.    Flo  internal 

cons tiStendli  statistic  can  be  computed  for  these  subtests. 

V 

■Jensen^ (1980)  cautions  that  such  conclusions  must  be  tentative 
since. he  ^id  not  have  reliability  coefficients  for  the  reanalysis  and, 


consequently,  could' not  make  connections  for  attenuation. 

^ These,  ranges  were  called  dull  normal  and  mentally  defective  .  ^ 
in  the  earlier  editions  of  the  Wechsler  scales.      ,  , 

^This  section  represents  a  revised' and  updated  veV^sion  of  a  chapter 

by  Kratochwill  (1982). '  ^  _  ^ 

7 

The  indirect-direct  dimensions  of  behavioral  assessment  presented  , 
here  are  not  to  be  confused  w^th.the  indirect-direct  distinctions 
co^r.nionly  made  betwee'n  traditional  and  behavioral  assessment  (see,  for 
example^  Hersen  and  Barlow,  1976,  pp.  114-120).  ^ 

The  BOI  is  available  from  Dr;  Peter  N.  Alevizas,  Department  of  .  \ 

.Psychology,  Straub  Hall,  University  of  Oregon,  Eugene,  Oregon,  94703. 

9  ' 
The  BCS  is  available  through  Research  Press,  Box  317741,- 

Clianipaigh,  Illinois,  61820.  ^' ,  - 

10  '  ' 

The  O'Leary  code  is  available  through'Dr.  K.  Daniel  O'Leary, 

Department  of  Psychology,  State  University  of  New  York  at  Stony  Brook,  - 

Long  Island,  New  York,  11794. 
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\hhG  Wahler  code  is  available  from' Dr.  Robert  G.  Wahler,  Child 
Behavior  Institute,  the  University  of  Tennessee  at  Knoxville,  Knoxville, 
Tennessee,  37916. 

^''tsLc^blishinq  the  reliability  and  validity  of  direct  behavioral 
assessment  nethods  is  more  than  a  methodological  issue,    fh  the 
Standards  for  Education.il  and  Psyi:hol ogi cal  Tests  of  the  American  '  ■ 

Psychological  Association  (  1974-)  it  is  noted  that".  .  .  the  psychologist 
v.'ho  . counts  examples  of  a  speci f i C  type  of  response  in  a  b^fiavior- 
modificaticn  setting  is  as  much 'responsible- for  the  validity  of  his 
interpretations  of  change  or  the  basic,  reliub-H-ity  of  his  observations, 
as  is  any  other  test  user"  (P.  4);  .         '  .  .  - 

^■^In  the  flational  L^bor  Relations-  Coa  Detroit  Edison  case 

the  -ompany  administere^sqp^chologlcal  aptio  dc    ssts  and  used  the 
results  to  determine  the  eligibility  of  employees  for  promotion. 
Althfujgh  the  union  wanted  access  to  the  test  protocols  and  answer  sheets, 
the  company  agreed  only  to  turn  over  the  material  to  a  qualifiejl  psychologist 
who  would  offer  advice.    The  federal  appeals  court  ruled  that  the  union 
had  the  right  to  examine  the  protocols  given  that  theyjjfould  not  copy 
or  disclose  them  and  would  return  them  to  the  company.  i 

^^Fol lowing  the  original  injunction  the  California  State  Department 
*  of  Education-  suspended  the  use  of  IQ  tests  in.  placing  all  children  in  EMR 

classes ,  /  ' 

^^See  Morris  and  Brown;?!  1 982)  for  a  similar  perspective  in  the  use 

of  behavior  modification  with  mentally  retarded  persons.  . 
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^  ^School  psychologists  arc  cilso  representGd  by  Division  76  _ 
(School  Psychology)  of  the  Arnericon  Psychological  Association. 

^'^At  this  v/r,iting  the  American  Educational  Research  Associ ati on 
-has  not  developed  a  professional  code  of  ethics  related  to  practice. 
However,  in  1981  an  ad  hoc  committee  was  formed  to  consider  the 
devolopMient  of  such  a  code.     Nevertheless,  the  AERA  has  sponsored 
a  symposium  entitled  "The  Testing  of  Black  Students"  that  was 
,  subsequently  published  (Miller,  1974).  . 
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